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Preface 


The importance of numerical analysis has increased considerably in the 
last few years. Today no university course in Mathematics is complete 
without at least an introduction to the subject being given. It is a subject 
which can be made appealing to students at various levels of sophistication 
and a carefully prepared introduction can be a suitable course for sixth 
year pupils in schools. For these pupils the subject must not be reduced 
to the mere mechanical evaluation of formulae. This approach makes the 
subject artificially dull and may do lasting harm to the pupils. It may 
put them off completely from studying further a subject which has so much 
to offer which is intellectually stimulating. Pupils must understand the 
theoretical aspects of the subject. However, the theory must not be over¬ 
emphasized at the expense of losing sight of the practicality of the subject. 
A middle course must be steered between theory and computation. It is 
hoped that this text will help in this aim. 

Another danger, when introducing the subject, is to give the impression 
that everything is cut and dried so that no interesting problems remain. 
This too tends to kill interest in the subject. It is hoped that, in particular, 
the brief mention of ill-conditioning at various points will help to dispel 
this impression. 

It must be emphasized too that numerical analysis is very closely linked 
up with computers. In a real situation the computer is used to carry out 
the (otherwise tedious) numerical computations. The reader should there¬ 
fore attempt to make full use of any computing facilities available to him 
when doing many of the exercises given in the text. 

The text has been written with the content of the relevant syllabus for 
sixth year studies in Scottish schools firmly in mind. It is based in part on 
a series of lectures given annually for several years at a school teachers’ 
summer school run jointly by Dundee College of Education and Dundee 
University. It is hoped that the text may also be helpful to students 
attending first year courses at Colleges and Universities. Some background 
material which may not be familiar to all readers is included in appendices. 

My thanks are due to Dr D P Thomas, University of Dundee, for many 
helpful discussions and to Mr A McMeeken, College of Education, Dr J M 



Rushforth, University of Dundee and Mr N Smart, Aberdeen College of 
Education who read the initial manuscript very thoroughly and made 
many valuable suggestions. As a result of these discussions and the 
amendments made the final text should be more acceptable to the students 
for whom it is intended. 
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Introduction 


IN AN ATTEMPT TO GAIN A CLEARER UNDERSTANDING OF A PROBLEM IN 

the physical or biological sciences, engineering or the social sciences, the 
applied mathematician first tries to contruct a mathematical model of 
the given situation. He tries to express the essential features of the problem 
in terms of a series of equations or inequations involving the relevant 
variables. These may be temperature and pressure in a physical problem, 
or buying and selling prices in an economics problem. Once he has 
obtained this mathematical model (and this is normally a very difficult 
task) his next step is to solve the equations or inequations of the model. 
If he is lucky, by using analytical methods he can perhaps obtain his 
solution in terms of a formula or set of formulae. Subsequent examination 
of these formulae might reveal a considerable amount of qualitative 
results. For example, it might be revealed that the effects of, say, tempera¬ 
ture are very small, while the pressure effects are relatively large. However, 
quantitative results may also be required, that is, numerical values may be 
required for some of the variables in the problem in certain situations. 
These can be obtained by substituting specific values in the formulae 
representing the solution. This process is not always as easy as it may 
appear at the moment, and indeed the accuracy of our final results may 
depend to a considerable extent on the way in which this is done. Choosing 
the best way in a given case (and carrying it out) is a problem in numerical 
analysis. 

Example. Determine the root of the equation 2x 2 + 3x -1 = 0 which lies 
between 0 and 1. 

(i) Proceeding by an analytic method we can complete the square in x on 
the left-hand side of the given equation to obtain 

2(x +1) 2 — 2(f) 2 — 1=0 


Hence 


Then 


(*+!) 2 = H 

X=-1±^ 
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Therefore the required root is given by — and this can now be 

evaluated to any required accuracy provided Jll can be obtained to the 
appropriate accuracy. 

(ii) Proceeding by a numerical method we can rewrite the given equation 
in the form 

x = i(l-2x 2 ) 

Now, starting with the approximation 0 to the required root, we can 
obtain a better approximation to the root by putting x = 0 in the right- 
hand side of the above equation. Thus we obtain the better approximation 
|(1 — 2 x 0) = Repeating the process we obtain the still better approxi¬ 
mation ^(1 — 2(j) 2 ) = jj. By repeating the process often enough we can, in 
this case, evaluate the root to any required accuracy. 


Numerical processes of this type are called iterative processes and will 
be discussed more fully in Chapter 7. 

At this stage the reader may well be of the opinion that in this example 
the analytical method is much superior to the numerical method. However, 
it must be pointed out that for equations of higher degree than the third, 
no analytical method corresponding to the above exists, whereas iterative 
methods similar to the above will still be applicable for determining a root. 

In some cases it may be extremely difficult (or even impossible) to obtain 
a solution for a given problem in terms of a formula or set of formulae 
using analytical methods, and in such cases it is necessary to use a 
numerical technique right from the start. Doing this we go straight to the 
quantitative results, that is, to the numerical values of our solution in 
various circumstances. If we compute sufficient of these, it may then also 
be possible to deduce some qualitative results. 

But numerical analysis is not merely a tool which is to be used when all 
else has failed. It may be that in certain problems it is preferable to use 
numerical analysis from the start and to go directly to the quantitative 
results, even when it is known that analytical methods would lead to a 
solution in the form of a formula or set of formulae. This decision might 
be reached simply because it is easier or faster (as a computer program may 
already exist) to do so. Alternatively, it may be cheaper to do the complete 
problem on the computer using numerical techniques than first to obtain 
a solution using analytical methods and then to use the computer to 
evaluate the resulting formula or formulae. 

For example, after considerable effort, it can be shown that 


I 


b 

a 


1 

1+X 4 


dx = 


1 

V2 


In 


x 2 + xj2+ l\ 
K x 2 — xj2+l) 


+ 2 tan 1 
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and clearly a considerable amount of work has still to be done before the 
numerical value of the integral is obtained. In this case it would be 
preferable to evaluate the integral by numerical methods right from the 
start. 

An advantage of numerical analysis is that in a sense the methods used 
are more general than are analytical methods, in that when a solution exists 
it can normally be computed. We can have two similar-looking problems, 
one of which can be solved easily analytically and the other of which 
requires a completely different method, or indeed may not be solvable at 
all analytically. A disadvantage is that using numerical techniques only one 
particular problem is solved at a time; and if another similar problem, 
with perhaps different data, is then to be solved, the whole process of 
solution has generally to be repeated from the beginning. However, with 
computers at their present stage of development this is no longer a serious 
problem. 

It is not always true that the scientist, the engineer and the social 
scientist are solely interested in numerical values for their solution 
function. It may be that they are more interested in the effects of changes 
in the values of several variables on the final result. This more basic 
understanding can also be obtained by using numerical analysis and 
computers, and carrying out a series of numerical experiments. The 
problem is solved several times on the computer with different values for 
the starting data. 

For example, the effect on the climate of certain areas of the earth of 
removing part of the polar ice-cap might be investigated by setting up a 
mathematical model of the general circulation of the atmosphere and then 
solving the resulting equations numerically with different boundary con¬ 
ditions corresponding to different positions of the edge of the ice-cap. 
Physical modelling of this situation in the laboratory is very difficult, and 
clearly any experimentation on the real system would be very costly and, 
more important, might lead to climatic disasters for certain areas. 

No doubt some of the points raised above will only become clear to 
the reader after he has obtained some experience in numerical analysis 
from the subsequent sections. 
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OBTAINING NUMERICAL VALUES FOR THE SOLUTION OF A MATHEMATICAL 

model is only part of the problem. We shall also normally require to know 
just how accurate are the numerical results. This is a very important part 
and is often more difficult than working out the results themselves. 

In calculations there are four types of error which can arise affecting 
the accuracy of the results. These are 

1. round-off errors 

2. errors due to imprecision of the given data 

3. mistakes 

4. errors due to the particular method used. 

We shall now look at each of these in turn. 


2.1. Round-off errors 

When carrying out numerical calculations there is a physical limit to the 
number of digits we can retain in a number; that is, we must round off 
our numbers to a finite number of digits. The error involved in doing this 
is called the round-off error. 

For example (a) f = 1-3333. .. 

To four significant figures f = 1-333 
Round-off error = 1-3333.. .— 1-333 
- 0-000333.. . 


(b) | = 1-6666... 

To four significant figures f = 1-667 
Round-off error = 1-6666.. . — 1-667 
= -0000333... 


(c) The following table shows the results of rounding exact 
numbers to N significant figures. 
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Number 

'n 

Rounded number 

Round-off error 

23-764462 

5 

23-764 

0-000462 

0-0092746 

3 

0-00927 

0-0000046 

75684-3195 

3 

75700 

-15-6805 

1-650045 

5 

1-6500 

0-000045 

0-57386 

3 

0-574 

-0-00014 

0-0003786 

3 

0-000379 

-0-0000004 

1-2545 

4 

1-254 

0-0005 

1-2545 

2 

1-3 

-0-0455 

1-25451 

4 

1-255 

-0-00049 

1-25451 

3 

1-25 

0-00451 

1-25451 

2 

1-3 

-0-04549 

1-25 

2 

1*2 

0-05 

1-35 

2 




Careful study of this table will reveal the danger of rounding to fewer 
significant figures numbers which have already been rounded. 

Now the working of a computer is such that it is convenient for the 
computer to carry out its calculations rounding off to a fixed number of 
significant figures at each stage, and indeed this is often the best approach 
when carrying out a calculation manually also. However, there are 
occasions when it may be necessary for us to use numbers which have 
been obtained to a fixed number of decimal places. It should be emphasized 
that rounding a number to a given number of significant figures will not , 
in general, give the same result as rounding this number to the same 
number of decimal places. 

For example, the following table shows the result of rounding exact 
numbers to N decimal places. 


Number 

N 

Rounded number 

Round-off error 

23-764462 

5 

23-76446 

0-000002 

0-0092746 

3 

0-009 

0-0002746 

75684-3195 

3 

75684-320 

-0-0005 

1-650045 

5 

1-65004 

0-000005 

0-57386 

3 

0-574 

-0-00014 

0-0003786 

3 

0-000 

0-0003786 


The results given here should be compared with the corresponding 
results in the previous table. In particular it should be observed that after 
rounding the last number in this table we do not obtain any significant 
figures at all. 
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Note. When rounding to k decimal places the round-off error must be 
within the limits 


±il(T fe 

For example, when rounding to 3 decimal places the round-off error must 
lie between 0 0005 and —0*0005, that is between ^10~ 3 and —^10~ 3 . 

2.2. Imprecision of the given data 

If the data are obtained experimentally, then they are of course only known 
within the limits of experimental error (which can normally be estimated), 
and this will limit the accuracy of the results of any subsequent calcu¬ 
lations. This apparently obvious fact—that the accuracy of results is 
limited by the accuracy of any initial data—is one that is frequently ignored 
by beginners. 

2.3. Mistakes 

Mistakes occur by misuse of the methods being applied. They are errors 
which are created by the person performing the calculations. A common 
mistake is to invert the order of two digits occurring in a number. For 
example, it is very easy to use the number 62381 instead of the number 
63281. When doing calculations, as many checks as possible should be 
incorporated in the method itself so that any mistakes come quickly to 
light. Mere repetition of a calculation is itself not a good check as it is 
quite likely that the same mistake will be repeated. 

If the computations are to be carried out on a computer, then it is not 
necessary to incorporate checks in the method, provided the computer 
program has been fully checked and it is known that the method being 
used is suitable for the particular problem being tackled. 

2.4. Errors of the niethod 

Errors of the method are due to replacing an exact process by an 
approximate one. For example, we introduce an error if we use only a 
finite number of terms from an infinite series expansion. This error is 
called a truncation error, that is, the error due to truncating the series 
somewhere. 

For example sin x can be expressed as the infinite series expansion* 
x 3 x 5 x 7 x 9 x 11 

*~3! + 5T~7[ + 97 - TTT + '" 


* For positive integers n,n\ = 1.2.3_ (n — l).n. 
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and when x is small the sum of the first three terms, namely x ——+—, 

will give a good approximation to sin x. The truncation error is then the 
sum of the remaining terms of the infinite series expansion namely 

_x^ x^_x^ 

7! + 9! 11! + . 


EXERCISES 

1. Round-off the following (exact) numbers to the number of significant figures 
indicated and state the round-off error introduced in each case. 

(i) 1-7514 to 3 significant figures 

(ii) 0-04795 to 3 significant figures 

(iii) 349-1862 to 3 significant figures 

(iv) 29-34625 to 4 significant figures 

(v) 0-486251 to 4 significant figures 

(vi) 1-6485003 to 4 significant figures 

(vii) 0-0004985 to 3 significant figures 
(viii) 5-0032 to 2 significant figures 

2. Round-off the numbers given in exercise 1 to the number of decimal places 
indicated and state the round-off error introduced in each case. 

(i) 3 decimal places 

(ii) 3 decimal places 

(iii) 3 decimal places 

(iv) 4 decimal places 

(v) 4 decimal places 

(vi) 4 decimal places 

(vii) 3 decimal places 
(viii) 2 decimal places 


2.5. Absolute error, relative error, precision and accuracy 

2.5.1. Absolute error and relative error 

If x represents an approximation to the value X and e is the error in this 
approximation (due to whatever source), then 

X = x + e 


The numerical value of the error e , denoted by \e\, is called the absolute 
e 

error and — is called the relative error. 

A 

For example, 27 may be used as an approximation to the (exact) number 
26-76. In this case X = 26-76 and x = 27. The error e (due to round-off 
in this case) is —0-24, the absolute error is 0-24 and the relative error is 
= 0-009 correct to three decimal places or one significant figure. 
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Each of the concepts of absolute error and relative error is useful in 
different circumstances. For example, to say that the absolute error in some 
number is 0 0004 does not mean very much unless we know just how big 
the number is itself. If the number is approximately 2, the above error 
may not be too important. The relative error is then only 0*0002, that is 
2 parts in 10 000. However, if the given number is approximately 0*002, 
the given absolute error is almost certainly more important. The relative 
error is 2 parts in 10. But if the number being estimated is itself zero, any 
error at all (however small) would give an infinite relative error. In such 
a case the absolute error would be more meaningful. 

In general we shall see that, when adding and subtracting, the absolute 
errors are more useful, whereas when multiplying and dividing the relative 
errors are more useful. 

2.5.2. Accuracy and precision 

Hitherto accuracy and precision have been used almost as synonyms. We 
shall now draw a distinction between the meanings of these two words. 

The precision of a measurement depends upon the least unit of measure¬ 
ment employed. Thus measurements of time correct to the nearest second 
are equally precise, even although they may not involve the same number 
of significant figures. The precision of a measurement is related to the 
absolute error in its value. The smaller the absolute error the greater the 
precision. 

For example (a) 6517*2 seconds and 3*4 seconds each measured correct 
to the nearest 0*1 second are equally precise. ( b ) 1072 seconds and 5 
seconds each measured to the nearest second are also equally precise. 
Both have a possible absolute error 0*5 second, (c) 1072 seconds measured 
to the nearest second is less precise than 5*1 seconds measured to the 
nearest 0*1 second. The least unit of measurement used in the measurement 
5*1 seconds (0*1 second) and the possible absolute error in this value (0*05 
second) are both less than the corresponding quantities for the measure¬ 
ment 1072 seconds (namely 1 second and 0*5 second). 

The accuracy of a measurement depends on the number of significant 
figures in its value (and not on the least unit of measurement). Thus 
measurements of time with the same number of significant figures will be 
equally accurate, even although the least units of measurement used may 
be very different, for example 1 second, 1 hour, 1 day. The accuracy of a 
measurement is related to the relative error in its value. The smaller the 
relative error the greater the accuracy. 

For example (u) 23*7 seconds (to the nearest 0*1 second), 2*37 hours (to 
the nearest 0*01 hour) and 237 days (to the nearest day) are all equally 
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accurate measurements (although their precisions are very different). Note 
that the relative error in each of the above measurements is the same, 
namely jtto- 

(j b ) The measurement 132*5 days (to the nearest 0* 1 day) is more accurate 
than the measurement 2*5 seconds (to the nearest 0T second). Note, how¬ 
ever, that 132*5 days is considerably less precise than 2*5 seconds. 

2.6. The effects of error on the basic operations of arithmetic 

In what follows it will be assumed that the relative errors occurring in the 
numbers used in the basic operations are small. If this assumption is not 
valid, that is, if the error in a number is of the same order of magnitude 
as the number itself, then it is pointless to continue with any error analysis 
such as the following. 

2.6.1. Addition (and subtraction) 

Example. Two lengths measured correct to the nearest 0*1 mm are 3*2 mm 
and 1*6 mm. What is the best estimate we can obtain, from these two 
measurements, of the sum of the two (exact) lengths? Consider the error 
in the estimate. 

The first length lies somewhere in the interval (3*2 ±0*05) mm, that is 
between 3*15 mm and 3*25 mm. 

The second length lies somewhere in the interval (1*6 + 0*05) mm, that 
is between 1*55 mm and 1*65 mm. 

Hence the minimum value for the sum is (3*15 + 1*55) mm = 4*70 mm 
and the maximum value for the sum is (3*25 +1*65) mm = 4*90 mm. Thus 
the value 4*8 mm, obtained by simply adding 3*2 mm and 1*6 mm, looked 
upon as an approximation to the sum of the two exact lengths has a 
possible absolute error of 0*1 mm. Note that this maximum absolute error 
of 0*1 mm is equal to the sum of the maximum absolute errors in the 
original measurements (each of which was 0*05 mm). Thus the maximum 
absolute error in the sum of two measurements is equal to the sum of the 
maximum absolute errors in the two measurements. The estimate of the 
sum can be given as 5 mm correct to the nearest millimetre or as 
(4-8 ±0*1) mm. The result should not be given simply as 4-8 mm, as this 
gives the impression that it is correct to one decimal place. 

In general, if x x and x 2 are approximations to the true values X l and 
X 2 respectively, and e l and e 2 are the corresponding errors in these 
approximations, then 

X i = x j + £ i and X 2 = x 2 + e 2 . 

Therefore X x +X 2 = (x 1 +x 2 ) + (e 1 + e 2 )- 
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Thus the sum (x 1 +x 2 ) of the two approximate values approximates the 
sum (X x + X 2 ) of the two exact values with an error (e x + e 2 ), the sum of 
the individual errors. 

Then \(X x + X 2 )-(x x + x 2 )\ = ki+^l 

^ kil + k 2 l* 

Thus the absolute error in the sum of two numbers is less than or equal 
to the sum of the absolute errors in the individual numbers. 

Now if the error e x is due to rounding-off X x tok x decimal places and 
the error e 2 is due to rounding-off X 2 to k 2 decimal places 

-ilCT* 1 < e x ^ i10~ kl and -^lO - * 2 ^ ^ il0~ k2 . 

That is and ^ 

Hence V\+e 2 \ < il0“ fel +il0“* 2 . 

Hence the error (e x + e 2 ) in the approximation (x x + x 2 ) to the true value 
(X x + X 2 ) lies within the limits ±(^10 -fel + ^10“ fc2 ). 

In particular, as will commonly happen in practice, if both the given 
numbers are rounded to the same number k (say) decimal places, then 
k x = k 2 = k and the error in (x x + x 2 ) lies within the limits + 10“*. Thus 
the approximation may be as much as one unit out in the fcth decimal place. 

Example . The numbers a and b when rounded-off to 3 decimal places 
are 3-724 and 4-037 respectively. Evaluate an approximation to (a + b) and 
consider the error involved. 

An approximation to (a + b) is given by (3-724 + 4-037) — 7-761. The 
error due to round-off lies between —10“ 3 and 10“ 3 , that is between 
— 0-001 and 0*001. Hence the true value of (a + b) lies between 7-760 and 
7-762. To two decimal places this is 7-76 and, while the third decimal 
place is not completely determined, we at least know that it is 0, 1 or 2. 
The result should not be quoted as 7-761 as this gives the impression that 
it is correct to 3 decimal places. It should be quoted as 7-76 to 2 decimal 
places or as 7-761 +0-001. 

Example. The numbers a and b , when rounded to four significant 
digits, are 23-86 and 0 01762 respectively. Evaluate an approximation to 
(a + b) and discuss the error involved. 

An approximation to (a + b) is given by (23-86 + 0-01762) = 23-87762. 
The round-off error lies between +(il0“ 2 +^10“ 5 ), that is between 
0-005005 and -0-005005. 


If a and b are any two real numbers, positive or negative, then |a + h| ^ \a\ + \b\. 
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Hence the true value of {a + b) lies between 23*882625 and 23*872615. 
Thus to four significant digits the true result is either 23*88 or 23*87. To 
three significant digits the result is 23*9. The result can also be given as 
23*878 + 0005. 

For subtraction we have 

X 1 -X 2 = (x 1 -x 2 ) + (e 1 -e 2 ) 


and so \(X l -X 2 )-(x 1 -x 2 )\ = \ei~e 2 \ 

^kil + k 2 |* 

Then if e x and e 2 are rounding errors such that \e x \ ^10 _fcl and 
\e 2 \ < jl0~ k2 we have 

\ei-e 2 \ ^il0" fcl +il0“ fc2 . 

Hence the error in the approximation (x x — x 2 ) to the true value (X x - X 2 ) 
must also lie within the limits ±(^10“ kl +^10“ /C2 ) and not the limits 
±(il0“ kl -il0' /C2 ). 

For example , 

32*75 rounded to 1 decimal place (three significant figures) is 32*8. 
Round-off error is 0*05. 

1*125 rounded to two decimal places (three significant figures) is 1*12. 
Round-off error is — 0*005. 

Subtracting these rounded numbers gives (32*8 —1*12) = 31*68. 
Subtracting the exact numbers gives 31*625. 

Hence, regarding 31*68 as an approximation to 31*625, the error is 

31*625-31*68 = -0*055 

= -(0*05 + 0*005) 

= -(iKT'+iKT 2 ). 

In particular, if two numbers are rounded to the same number k of decimal 
places, the limits for the error in their difference are ±10~ k . 

Example. The numbers a and b when rounded to 3 decimal places are 
3*724 and 2*251 respectively and the numbers c and d when rounded to 
4 significant digits are 23*86 and 0*01762 respectively. Evaluate approxi¬ 
mations to (i) b-a and (ii) c-d and discuss the error in each 
approximation. 


If a and b are any two real numbers, positive or negative then \ a — ^ |a| + |fc|. 
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(i) Approximation to (b — a) is given by (2*251 — 3-724) = —1*473. 
Round-off error lies between —0*001 and +0*001. 

Hence true value of (b — a) lies between —1*474 and —1*472. 

To two decimal places this is —1*47 and the third decimal figure is 2, 
3 or 4. 

The result may be stated as —1*473 + 0*001. 

(ii) Approximation to (c — d) is given by (23*86 — 0*01762) = 23*84238. 
Round-off error lies between ±?(10~ 2 + 10 -5 ) = ±0*005005. 

Hence true value of (c — d) lies between 23*847385 and 23*837375. 
Thus to four significant figures the true value will be either 23*85 or 
23*84. 

To three significant figures the result is 23*8. 

The result could also be given as 23*842 + 0*005. 

Above, we have seen that when two numbers which have been rounded 
to k decimal places are added, the resulting error in the sum lies between 
the limits +10 ~ k . If now we add or subtract a third number, also rounded 
to k decimal places, we will of course introduce a further error. The total 
error now will lie within the limits +(10“ /c + ^10 - *) = ±fl0 _fc . Similarly, 
if n numbers (positive or negative) rounded to k decimal places are added, 
the resulting error will lie within the limits + fl0~*. 

In particular, in the case of 10 numbers all rounded to the second 
decimal place say (that is each with an error within the limits +0*005) 
the resulting error will be within the limits +0*05, that is a whole order of 
magnitude greater than the individual errors. 

Example. The numbers a, b , c and d when rounded to 3 decimal places 
are 3*724,2*251,4*037 and 1*939 respectively. Evaluate approximations to 

(i) a + b — c and (ii) a+b—c—d 

and discuss the error in each approximation. 

(i) (a + b — c) is approximated by (3*724 + 2*251—4*037) = 1*938. 
Round-off error lies between + §10' 3 , that is between +0*0015. 
Hence true value of (a + b — c) lies between 1*9365 and 1*9395. 

To two decimal places this is 1*94. 

(ii) (a + b — c — d) is approximated by 

(3*724 + 2*251-4*037-1*939)= -0*001. 

Round-off error lies between +2 x 10“ 3 , that is between +0*002. 
Hence true value of (a + b — c — d) lies between —0*003 and +0*001. 
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Thus in this case we do not even have one significant figure in our result 
although the numbers we started with all had four significant figures. 
Indeed we are not even certain as to the sign of the true result. This 
situation has arisen because effectively we have had to evaluate the 
difference of two very nearly equal numbers (a + fr) and (c + d ). This is 
a situation which always creates difficulties in numerical analysis. If it 
arises in practice, it may be that it can be avoided by employing an 
alternative method to solve the original problem (see §2.7). If this is not 
convenient, then it will be necessary to use many more significant figures 
in the calculations than are required in the final result, in order to allow 
for the inevitable loss of significant digits when the subtraction is carried 
out. 

2.6.2. Multiplication 

In the same notation as before 

X x X 2 — (Xi + e 1 )(x 2 + e 2 ) 

= x 1 x 2 + e 1 x 2 + e 2 x 1 + e 1 e 2 
~ x 1 x 2 + e 1 x 2 + e 2 x 1 

since the product c x e 2 can be neglected compared with the terms e±x 2 and 
e 2 x x . Then the error in the approximation x 1 x 2 to X x X 2 is (approximately) 
(e 1 x 2 + e 2 x 1 ). 

Also, the approximate absolute error is 

kiX 2 + e 2 x 1 | ^ \e l x 2 \ + \e 2 x 1 \ 

= |x 2 |k 1 | + |x 1 ||e 2 |. 

When e x and e 2 are due to rounding to k x and k 2 decimal places re¬ 
spectively, we have 

\e l x 2 + e 2 Xi\ ^ |x 2 |il0“ kl + |x 1 |^10 kl . 

Further if k 1 = k 2 — K that is if both numbers have been rounded to k 
decimal places, we have 

\e i x 2 + e 2 x 1 \ <(|x 1 | + |x 2 |)ilO k . 

The approximate relative error in the product x^ 2 is 
\e 1 x 2 + e 2 x 1 \ ~ |g 1 x 2 + e 2 x 1 [ 

|XiX 2 | ~ \x x x 2 \ 
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That is, the approximate relative error in the product of two numbers is 
less than or equal to the sum of the relative errors in the two numbers. 

Example. The numbers a and b when rounded to 3 decimal places are 
4-701 and 0*832 respectively. Evaluate an approximation to ab and discuss 
the error involved. 


ab is approximated by (4-701)(0*832) = 3-911232 (using long multipli¬ 
cation). 

The approximate absolute error ^ (4-701 + 0*832)^10“ 3 — 0 0028. 
Hence the true value of ab to 3 decimal places lies between 3-908 and 
3*914. This is normally stated as 3-91 to two decimal places (or 3 significant 
figures) or as 3-911+0*003. 

0*0005 

Alternatively, since the relative error in 4-701 ^~ 0*00011 and 

4*701 

, , . 0*0005 

the relative error in 0*832 ^ — — 0*00060, the approximate relative 

0*832 

error in their product ^ 0-00011 +0-00060 = 0 00071. 

Hence the approximate absolute error in the product ^ 0-00071 (3-9) 

-0-0028 

which agrees with what was obtained above. 


Example. The numbers a and b when rounded to four significant digits 
are 37-26 and 0*02146 respectively. Evaluate an approximation to ab and 
discuss the error involved. 

ab is approximated by (37*26) (0-02146) = 0*7995996 (by long multipli¬ 
cation). 

Approximate absolute error ^ i(37-26 x 10 -5 + 0-02146 x 10 -2 ) 

- 0*00029. 

Hence true value of ab lies between 0*7999 and 0-7993 when rounded to 
four significant figures; that is, to three significant figures the true value 
may be either 0-800 or 0-799; to two significant figures the value is 0*80. 
The result may also be stated as 0-7996 + 0*0003. 

Considering the product x x x 2 x 2 of three inexact numbers we obtain, in 
the same way as above, that 


the approximate absolute error < \xix 2 \\e$\ + \x 2 x 3 \\e x \ + \x 2 Xi\\e 2 \ 


and that the approximate relative error 


ei\ 

+ 


+ 

*3 



Xi 


x 2 


*3 


From these two results it should be clear that, when considering the 
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product of three numbers, it is better to work in terms of relative error 
rather than absolute error (even when it is the absolute error which is 
finally required). 

Example. The numbers a, b and c when rounded to 3 decimal places 
are 4-701,0-832 and 2-413 respectively. Evaluate an approximation to abc 
and discuss the error involved. 


abc is approximated by (4-701)(0-832)(2-413) = 9-437802816. (As we 
shall see it is a foolish waste of time to retain so many digits in the 
approximation.) 


Relative error in 4-701 

Relative error in 0-832 

Relative error in 2-413 


^ 0-0005 
^ 4-701 


- 0 - 00011 . 


0-0005 
^ 0-832 


~ 0-00060. 


0-0005 
^ 2-413 


- 0 - 00021 . 


Hence approximate relative error in product 

<c 0 00011+0-00060 + 000021 
= 0-00092. 

Hence approximate absolute error in product 
^000092(94) 

- 0 0086. 

Thus the true value of the product lies between 9-429 and 9-446 when 
rounded to four significant figures. Therefore the true value of the product 
is 94 correct to two significant figures. The result can also be given as 
9438 + 0 009 or, better, as 944 + 0 01. 

In the same way as above, it can be shown that the approximate relative 
error in the product x x x 2 x 3 ... x n is less than or equal to the sum of the 
relative errors in x A , x 2 , x 3 ,..., x n for any positive integer n. In particular, 
the approximate relative error in x", in which n is a positive integer, is n 
times the relative error in x x . This last result can be generalized to give the 
result that the approximate relative error in x" for any real n is \n\ times 
the relative error in x x . 

Example. The number 7-36 is a correctly rounded approximation to the 
number a. Using tables of square roots obtain as accurate an approxi¬ 
mation as possible to -Ja. 
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The approximate relative error when y/7 -36 is used as an approximation 

to is w) - °' 00034 - 

From (5 figure) tables 7*7*36 = 2*7129 correct to four decimal places. 
Hence the approximate absolute error when 7*7*36 is used as an 
approximation to yja is (0*00034 x 2*7) ~ 0*001. 

Therefore yja lies between ^7*36 + 0*001 and 77*36 — 0*001. 

Hence yja = 2*713+0*001. 

Correct to two decimal places y/a is 2*71. 

(Note that, correct to three decimal places x /7-365 = 2*714 and 
77*355 = 2*712). 

2.6.3. Division 

Again, using the same notation, we have 


f x 2 (x l +e l )-x 1 (x 2 + e 2 ) 
x 2 x 2 {x 2 + e 2 ) 



Xi X\ +^i Xxf X!+ei 

X 2 x 2 + e 2 x 2 \x 2 + e 2 


X\ +x 2 e 1 — x 1 e 2 
x 2 x 2 (x 2 + e 2 )' 


x x l 

Then the error involved in using — as an approximation to —- is 

x 2 A 2 


x 2 (x 2 + e 2 ) \x 2 xi J\ x 2 J 





on application of the binomial theorem.* 

Now, since ei and e 2 are small, so that products of at least two of them 
(for example e\e\, etc.) can be neglected, we see that this error is 

. . , (ei 

approximately I — 


\x 2 


V 2 

x 2 


Thus the approximate absolute error 


ei 


xie 2 


+ 

2 

x 2 


X2 


* See Appendix 1. 
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The relative error is 


x 2 ei —x\e 2 

/ 

X! 


X2 ( 


xi{x 2 + e 2 ) 

/ 

x 2 



V X 2 X2 ) 


_£i_ *1 e 2 

X ! * t X 2 




1 + 


£i 

Xi 


*2 

*2 


1 +^ 

Xi 




£ l __£2 
*1 X 2 


Xi 

< 


e 2 


1 -— 1 -—+ 


x 2 

P 2 

*2 


ei 


Xi 


Hence (as for multiplication) the approximate relative error after division 
is less than or equal to the sum of the individual relative errors. 

From the above results for absolute error and relative error it should 
be clear that it is easier to use relative error when considering division. 


Example. The numbers a and b, when rounded to four significant digits 

a 

are 37*26 and 0*05371 respectively. Evaluate an approximation to - and 

b 

discuss the error involved. 

* . . a . , 37*26 _ 

An approximation to - is given by = 693*725 ... 

b 0*05371 


, . 0*005 0*000005 

The approximate relative error ^ 37^5 + 00537 T 


: 0*00023. 


Hence the approximate absolute error in the quotient 
^0*00023 (690)-0*16. 

Therefore the true value of a/b lies between 693*9 and 693*6 when 
rounded to four significant figures. Hence, to three significant figures a/b 
is 694. 

Note that, as a consequence of the above results for multiplication and 

XiX 2 

division, the approximate relative error in the expression-is less than 

X 3 

or equal to the sum of the relative errors in xi, x 2 and x 3 . 


Example. The numbers a, b and c when rounded to three decimal places 
are 4*701, 0*832 and 2*413 respectively. Evaluate approximations to 
(i) ab/c and (ii) 1/ab and discuss the error in each of these approximations. 


*See Appendix 1. 
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,.ab. . JL (4-701)(0-832) . 

( 1 ) — is approximated by- -= 1-62090... 

c 2-413 

, . 0-0005 0 0005 0-0005 

The approximate relative error sc 4 . 7Q f + q-832 + 2-413 

~ 0-00011+0 00060 + 0-00021 
= 0-00092. 

Hence the approximate absolute error ^ 0 00092(1-6) ^ 0-0015. Therefore 
the true value of ab/c lies between 1-619 and 1-622 when rounded to four 
significant figures. 

Thus the true value is 1-62 correct to three significant figures. The result 
can also be given as 1-621 +0-002. 

(ii) i is approximated by - 0 255674... 

The approximate relative error ^ 0-00011+0*00060 = 0 00071. (Note 
that this is the same as the relative error in the product ab itself, since 
the numerator 1 in the quotient 1/ab is exact and so has no error.) 

Hence the approximate absolute error < 0*00071 (0*26) ~ 0*00018. 
Therefore the true value of 1/ab lies between 0*2555 and 0*2559 when 
rounded to four decimal places. 

Hence the true value of 1/ab may be given as 0*256 correct to three 
decimal places or as 0*2557 + 0*0002. 

2.6.4. Error in functional evaluation 

If e is the error in the approximation x to the number X so that 
X = x + e then, if e f denotes the error when a function / is evaluated 
(exactly) at x instead of at X , we have 

f(X) = f(x) + e f . 

Therefore e f = f(X) — f(x) 

= f(x + e)-f(x) 

= [f (x) + ef\x) + \e 2 f"{x) + * * *] -/(x) 

on expanding/(x + e) in a Taylor series.* 

Therefore e f — ef\x) + \e 2 f"(x) + * * * 

Hence if e is small (and the second and higher derivatives of /evaluated 
at x are not excessively large) we see that e f ~ ef\x\ 


See Appendix 1. 
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Thus \e f \ c* \e\\f'(x)\. 

Because only the first one or two significant figures in an error value 
are useful, it will normally only be necessary to evaluate f'(x) correct to 
two significant figures. 

Example. The example on page 15 obtaining an approximation to -Ja 
is an example of error in function evaluation. 


Here /(x) = yjx 

and so \e f \ ^ |e| |/'(x)| = \e\\jx _i \. 


Hence the approximate absolute error in ^7-36 = 




0-005 

V7-36 


0-001 

2 2-7 


which is the same result as obtained on page 16. 

Example. The numbers a and b when correctly rounded to three decimal 
places are 0-359 and 0*745 respectively. Use tables to obtain as accurate 
approximations as possible to cos a and cos b. 

From six-figure tables cos (0-359) ~ 0-936249. 

Now since the derivative of cos x is — sin x and the maximum absolute 
error in 0-359 due to round-off is 0*0005, we see that an approximation to 
the corresponding maximum absolute error when cos (0*359) is used as 
an approximation for cos a is 0-00051—sin(0-359)| ~ 0-0002. (Note that 
there is no need for anything like six-figure accuracy in the value of 
sin (0-359).) 

Hence cos a = 0*9362 ± 0-0002, or 0-936 correct to three decimal places. 
Also, from tables cos (0-745) ~ 0-735088, 
and 0-0005] -sin(0-745)| ~ 0-0003. 

Hence cos b = 0-7351 ±0 0003, or 0-735 correct to three decimal places. 

EXERCISES 

1. The numbers 11-029, 2-3452, 13-374 and 0 00855 are correctly rounded approxi¬ 
mations to the numbers a, b, c and d respectively. 

Evaluate approximations to 

(i) a + b 

(ii) a + c 
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(iii) a + d 

(iv) a—b 

(v) a — c 

(vi) a — d 

(vii) a + b — c 

and discuss the error in each approximation. 

2. A series of n numbers which have been correctly rounded, not all to the same 
number of decimal places, are to be added together to obtain an approximation 
to their sum. The least number of decimal places given in any of the numbers 
is d. Their sum can be approximated by 

(i) adding all the rounded numbers and then rounding their total to d decimal 
places, or 

(ii) rounding all the given numbers to d decimal places and then adding them. 
Would you expect the results of (i) and (ii) to be identical? If not, which result 
would you expect to be the better approximation to the true sum? 

3. With a, b, c and d as in exercise 1, evaluate approximations to 

(i) ac 

(ii) ad 

(iii) abc 

(iv) ab/c 

(v) 1/ac 

and discuss the error in each approximation. 

4. The numbers 2T4 and 8-27 are correctly rounded approximations to the numbers 
a and b respectively. Using tables of square roots, obtain as accurate approxi¬ 
mations as possible to yja and ^ Jb and state the errors in these approximations. 

5. The numbers a and b when correctly rounded to three decimal places are 0-359 
and 0-745 respectively. Use tables to obtain as accurate approximations as 
possible to sin a and sin b. 


2.7. Reducing round-off error by choice of method 

The effects of round-off errors can be very different using different 
numerical methods (or algorithms) to calculate the same quantity. This 
will be illustrated by considering a few specific examples. 

Example. Consider the evaluation of ^/(x+1) — yfx for x = 700 and 
using four significant digits in the calculation. 

7701-^700 = 26*48-26*46 

= 0*02 

Since both 26*48 and 26*46 have been rounded off, the error in the final 
result will lie between ±0*01 so that we do not even have one-figure 
accuracy. The essential difficulty is that we have subtracted two “very 
nearly equal” numbers, and this has inevitably resulted in a loss of 
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significant digits. One way of overcoming the difficulty is to use more 
significant digits in the intermediate calculations. This may not be con¬ 
venient, however, for a variety of reasons. For example, four-figure tables 
of square roots are readily available, whereas seven or eight-figure tables 
are not so common. Thus, a better way of overcoming the difficulty is to 
use an alternative method to calculate the desired quantity. For example, 
we might proceed in the following way. 

V7M—V700-(V701—V700)( ^;W700 ) 

1 

“■7701+7700 

l 

“ 52-94 
= 0*01889 

The (approximate) relative error in this result is less than or equal to 

5294 ~ 0*00019 and so the (approximate) absolute error is less than or 

equal to 0*000004. Hence, using this method and again using only four 
significant digits in the calculations, we have obtained at least three 
significant digits in the final result. 

Example. Evaluate the roots of the quadratic equation 

x 2 — 60x +1 = 0 

using four significant digits throughout the calculations. 

It is well known that the roots of the quadratic equation 
ax 2 + bx-\-c — 0 (a ^ 0) 

are given by —b±yj(b 2 — 4ac) 

2 a 

Therefore, in the particular case being considered, the roots are given by 

«±^-4> . 30±v>w 

~ 30 + 29-98 

(where the square root is given correct-to four significant digits). 

Hence we obtain for the two roots = 59-98 and x 2 = 0-02. Thus while 



22 


ERRORS 


we have obtained one root (xi) to four significant digits, the other root 
(x 2 ) has only been obtained to one significant digit. Starting from exact 
data (the coefficients of the polynomial) and working always to four 
significant digits accuracy, we have obtained only one significant digit in 
one of the roots. Once again, the essential difficulty is that to calculate the 
root x 2 involved the subtraction of two “nearly equal” numbers, thus 
resulting in the loss of significant digits. It is not difficult to construct a 
polynomial for which, again working to four significant digits and using 
the above formula, no significant digits at all are obtained for one of the 
roots (for example x 2 — 220 x +1 = 0 ). 

To overcome the difficulty we can use a method very similar to that 
used in the previous example to calculate the smaller root x 2 . 

* 2 = 30 -V899 -(30-V899,|[±^| 

1 _ 1 
“30 + 7899 “59-98 

- 0-01667 

(using four significant digits in the calculations). 

The (approximate) absolute error in this result is less than or equal to 
0-00001 so that at least three significant digits have been obtained. 

Note that this method for calculating x 2 is equivalent to using 

(product of the roots of the quadratic) 


1 

“ 30 + 7899 

Thus, in general, when solving a quadratic equation the root with 
larger numerical value should be found first and the second root 
determined by dividing the product of the roots by this value. 

EXERCISES 

1. Evaluate 7479 — 7478 correct to three significant figures assuming that only 
four significant figures can be retained in any calculations. 

2. Evaluate both roots of the quadratic equation 

x 2 — 18x +1 = 0 

as accurately as you can assuming that only three significant figures can be 
retained in any calculations. 
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3. Evaluate approximations to the roots of the quadratic equation 

x 2 + lOOx — 4 = 0. 

Consider the value of the polynomial at the approximate roots and reconstruct 

the polynomial from the approximate roots. 

I_cos X 

4. How would you compute —r—— for small values of x? 

sinx 

5. For values of x near 4, consider the computation of 

1 1 
y/ x 2 
x—4 

Evaluate the expression at x = 3*9. 

2.8. Statistical treatment of errors 

It must be emphasized that the analysis of this chapter has been concerned 
with obtaining upper limits for the absolute errors in the various calcu¬ 
lations. Most often these upper limits will be considerably larger than the 
actual errors occurring, however, and the reason for this is not difficult to 
appreciate. For example, consider the addition of n numbers which have 
been rounded to k decimal places. The maximum absolute round-off error 
in each of the numbers is ^10~ k and so the maximum absolute error in 
their sum is ^nl0~ k . However, in order to obtain an error of such 
magnitude in the sum, the error in each individual number would have 
to have had its maximum possible magnitude and each of these errors 
would also have to have been of the same sign. Clearly the chances of this 
occurring in practice are very small indeed. Normally the magnitude of 
the individual errors will vary and some will be positive and some 
negative (that is some numbers will be rounded up while others will be 
rounded down) so that some cancellation of errors will occur. Assuming 
a random distribution of errors it can be shown using statistical theory 
that for large n a more realistic estimate of the error occurring in the sum 
of n numbers each rounded to k decimal places is i N /rclO _fc . Similar results 
can of course be obtained for the effects of errors on the other basic 
operations of arithmetic. 

It must however be emphasized that these statistical results do not 
apply when only a few numbers are being operated on. Further, in any 
particular case, we can be 100% sure only that the errors involved lie 
within the upper limits obtained in the earlier sections. 


B 



Evaluation of formulae 



IF THE VALUES OF A FUNCTION / ARE REQUIRED FOR VALUES OF THE 
independent variable at intervals of p starting at a and finishing at b , where 
it is implied that (b — a) is equal to a positive integer times p, then we write 
“evaluate / for a(p)b ” 

For example, evaluate x 2 for 1(1)10 means determine the values of x 2 
at unit intervals of x from x = 1 to x = 10, that is for x = 1, 2, 3, 4, 5, 6, 
7, 8, 9, 10. 

The following table shows the values, correct to three decimal places, of 
/for 1(0T)2 where/(x) = 1/x. 


X 

m 

1 

1000 

M 

•909 

1-2 

•833 

1-3 

•769 

1*4 

•714 

1-5 

•667 

1-6 

•625 

1-7 

•588 

1-8 

•556 

1-9 

•526 

2 

•500 


A flow diagram for evaluating/(x) for a(p)b is shown opposite. 


EXERCISES 

x 

1. Tabulate 2 for 0(1)10 correct to four decimal places. 

2. Tabulate log (x* +1) for 5(5)25. 

3. Tabulate 3 sin x° for 0°(10°)90°. 

3.1. The evaluation of polynomials (nesting) 

Consider the evaluation of the polynomial 2x 3 — 5x 2 — 3x + 4 when x = a. 
One way of doing this is first to calculate a 2 and a 3 (= a x a 2 ). This would 
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involve two multiplications. To complete the evaluation would then 
require three more multiplications (one corresponding to each of the 
coefficients of x, x 2 and x 3 ) and three additions. Thus the whole process, 
carried out in this way, would require a total of five multiplications and 
three additions. 

Now suppose that we rewrite the given polynomial in the form 
((2x — 5)x — 3)x + 4 

and evaluate it at x = a in the way suggested by this bracketed form. 
Carried out in this way then the whole evaluation requires only three 
multiplications and three additions. This method of evaluation is known 
as nesting. 

Similarly, evaluation of the polynomial 

a 5 x 5 + U 4 X 4 + a 3 x 3 -f a 2 x 2 + a^x + a 0 
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by first calculating a 2 , a 3 , a 4 , a 5 will require at least nine multiplications 
and five additions. If this polynomial is evaluated in the way suggested 
by writing it in the form 

((((a 5 x + a 4 )x + a 2 )x + a 2 )x + ajx + a 0 

that is by multiplying a 5 by a, adding a 4 , multiplying by a, adding a 3 , 
etc.—the nesting method—then we only require five multiplications and 
five additions. 

In general, the evaluation of the rath degree polynomial 

p(x) — a m x w + a m _ 1 x" I “ 1 + -*- + a 2 x 2 + a 1 x + a 0 with a m ^ 0 

will require at least (2m — 1) multiplications and ra additions if a 2 , a 3 ..., a w 
are evaluated separately, but will only require ra multiplications and ra 
additions if the method of nesting is applied. 

Clearly then the method of nesting should be used when evaluating 
polynomials either on a hand machine or in a computer. 

The process can be set out, showing the intermediate values obtained 
during the calculation, as illustrated below for the evaluation of the 
polynomial a 2 x 3 + a 2 X 2 + aix + a 0 at x = a. 


a 

a 3 a 2 a\ ao 

i „ a b 3 s afc 2 „ oibi 


1 b 3 x b 2 / b\ x 

bo 


= a 3 , b2 — OLb 2 + a 2 , bi=ab 2 + ax and bo = otb 1 +ao . The arrows 
indicate the order in which the values are calculated. The value of b 0 is 
the value of the given polynomial at x = a. (The reader should verify this 
for himself.) 

If a hand machine is being used it is not, of course, necessary to record 
any intermediate values. 

Example. Evaluate the cubic 2x 3 — 5x 2 — 3x + 4 when x = 05. 


0*5 

2 

-5 

-3 

4 



1 

-2 

-2*5 


2 

-4 

-5 

1 1-5 


Hence the value of the given cubic at 0*5 is 1*5. 

Example. Evaluate the quartic 3x 4 + 5x 3 — x 2 — 2x+1 when x = — T2. 


-1*2 

3 

5 

-3*6 

-1 

— 1 68 

-2 

3*216 

1 

-1*4592 


3 

1*4 

-2*68 

1*216 

| -0*4592 
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Hence the value of the given quartic at —1*2 is —04592. 

3.1.1 . Flow Diagram 
A flow diagram for the evaluation of the mth degree polynomial 
a m x m + a m ~ ix m ~ 1 +• * * + aiX + a 0 is shown below. 



EXERCISES 

1. Evaluate the polynomial 

2x 3 — 3x 2 + x —4 

at x = 1-75. 

2. Evaluate the polynomial 

3x 4 —x 3 — 2x —5 

at x = 0-683. Give the answer correct to three decimal places. 
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3.2. Synthetic division of a polynomial by a linear factor 

In this section we consider a method of obtaining the quotient polynomial 
and the remainder when a polynomial is divided by a linear factor. 

For example, let us consider the long division of the cubic polynomial 
2x 3 — 5x 2 — 3x + 4 by the linear factor (x — 0*5). 

This proceeds as follows: 


2x 2 —4x —5 
x—0*5 )2x 3 — 5x 2 — 3x+4 
2x 3 - x 2 
-4x 2 -3x 
— 4x 2 + 2x 

— 5x+4 

— 5x + 2-5 

1*5 


Thus the quotient is the polynomial 2x 2 — 4x — 5 and the remainder is 
1*5. These results should be compared with the table drawn up for the 
example on page 26. This comparison shows that the coefficients of the 
quotient polynomial are the first three numbers (starting from the left) in 
the bottom row of the table. The final number on the right of the bottom 
row of the table is the remainder. Thus in this case we see that, by 
carrying out the nesting procedure for evaluating the given polynomial at 
0*5 and noting the intermediate values, we obtain the quotient and 
remainder when the given polynomial is divided by the factor (x —0*5). 
This result is true in general for the division of a polynomial of arbitrary 
(integral) degree by a linear factor and should be verified by the reader 
for the division of the general cubic a$x 2 + a 2 x 2 + aix + a 0 by the linear 
factor (x —a). 

Thus the quotient and remainder when a polynomial p(x) is divided by 
the linear factor (x — a) can be obtained as a result of using the nesting 
procedure to evaluate the polynomial p(x) at a. 

The process, being essentially nesting, may be recorded in the same 
tabular form as the nesting process and is called synthetic division. 


Example. Use the method of synthetic division to obtain the quotient 
and remainder when the polynomial 3x 5 + 5x 4 -h8x 2 + 7x + 4 is divided 
by (x + 2). 

We construct the table for the evaluation of the given polynomial at 
— 2 using the nesting process. 
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This is 


-2 

3 

5 

0 

8 

7 

4 



-6 

2 

-4 

“8 

2 


3 

-1 

2 

4 

-1 

6 


Hence the quotient is 3x 4 —x 3 + 2x 2 + 4x — 1 and the remainder is 6 . 

EXERCISES 

1. Obtain the quotient and remainder when the polynomial 

5x 4 + 3x 3 —x 2 + 6x —7 

is divided by x—2. 

2. Obtain the quotient and remainder when the polynomial 

3x 7 -fx 6 — 4x 5 + 2x 4 + 2x 3 —x 2 +x — 1 

is divided by x + 3. 

3. Obtain the quotient and remainder when the polynomial 

2-31792x 4 + 6 17843x 3 - 4-40632x 2 + 5-16931x - 9-76802 
is divided by x + 3-47918. (Use six significant figures in the calculations.) 

4. Obtain the quotient and remainder when the polynomial 

3-40617x 5 - 5-68742x 4 - l-02795x 2 + 2-30469 
is divided by x—0-629734. (Use six significant figures in the calculations.) 

5. Express the quotient polynomial and the remainder when a polynomial p(x) is 
divided by (ax — b) f in terms of the quotient polynomial and the remainder when 
p(x) is divided by (x — b/a). Hence use the method of synthetic division to 
determine the quotient and remainder when the polynomial 

x 4 — 2x 3 — 1 lx 2 + 12x + 36 


is divided by 2 x + 1 . 



Finite differences 



4.1. Finite difference tables 

The distance s travelled by a car from a given fixed point will depend on 
the time t for which the car has been travelling. Indeed, since for any 
particular time the car must have travelled a unique distance, the distance 
is a function of time. Let s = f{t). Now suppose that when the distance 
travelled (s metres) is measured at successive 10-second intervals in time 
(t seconds) we obtain the following results: 


t 

5 = m 

0 

0 

10 

214 

20 

736 

30 

1446 

40 

2270 

50 

3164 

60 

4100 


In order to extract more information from the above data it is often 
useful to rewrite the table in the expanded form of a finite-difference table 
as shown: 


t m 
0 0 
10 214 

20 736 

30 1446 

40 2270 

50 3164 

60 4100 


1st Differences 

214 

522 

710 

824 

894 

936 


2nd Differences 

308 

188 

114 

70 

42 


3rd Differences 


-120 

-74 

-44 

-28 


4th Differences 


46 

30 

16 


The first differences are obtained by subtracting each function value 
from the one immediately below it in the table. For example, the top two 
entries in the column of first differences are obtained by subtracting 0 from 
214 and 214 from 736. The second differences are obtained in the same 
way, that is, by subtracting each first difference from the one immediately 
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below it in the table. For example, the top entry in the column of second 
differences is obtained by subtracting 214 from 522. Similarly third and 
higher-order differences can be obtained. It is customary to write even- 
order differences on the same lines as the function values, while odd-order 
differences are written on intermediate lines (see the above table). There 
are three different notations for differences in common use. These different 
notations make use of the symbols <5, A and V which are all pronounced 
“delta”. 


4.1.1. Central differences 

The first notation is that for central differences. Here we denote the first 
difference f(ti) — f(t 0 ) by <5 /i / 2 . The first difference f(t 2 ) — f(ti) is denoted 
by < 5 / 3/2 and so on. In general (/+i/ 2 — ft- 1 / 2 ) = <5/;. Similarly the second 
difference (< 5 / 3/2 — <5/l/ 2 ) is denoted by <5 2 /i, etc. Thus, in central-difference 
notation, the difference table for a function / of an independent variable 
t has the form 


to 
h 
t 2 
t 2 
1 4 


fo 

h 

h 

h 

u 


<5/1/2 

<5/3/2 

¥5/2 

<5/7/2 


<5 2 /i 

<5% 

<5/ 3 


<5/3/2 

< 5 / 5/2 


in which f 0 = f(t 0 \ fi = fit 1 ),..., U = f(U\ 


Example. Evaluate the polynomial /(x) = x 3 — 8x 2 — 4x +1 for integer 
values of x between x = 1 and x = 10 and construct the difference table. 


x 

0 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 


fix) 

1 

-10 

-31 

-56 

-79 

-94 

-95 

-76 

-31 

46 

161 


-11 

-21 

-25 

-23 

-15 

-1 

19 

45 

77 

115 


-10 

-4 

2 

8 

14 

20 

26 

32 

38 


6 

6 

6 

6 

6 

6 

6 

6 


0 

0 

0 

0 

0 

0 

0 


Note that the third differences all have the same value (namely 6) and so 
all fourth and higher differences will be zero. 
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It is customary to omit decimal points and to use only significant figures 
when writing down differences in a table. When individual differences are 
quoted, however, they should be given in full with the decimal point shown 
(if appropriate). 

Example. Evaluate the polynomial / (x) = x 3 — 8x 2 — 4x +1 for 
x = 0(01)0-5 and construct the difference table. 


X 

fix) 



0 

0-1 

0-2 

1 

0-521 

-0112 

-479 

-633 

-781 

-923 

-1059 

-154 

-148 

0-3 

0-4 

0-5 

-0-893 

-1-816 

-2-875 

-142 

-136 


Note that the third differences all have the value 0-006. 

It is not necessary to use t 0 at the top of the table; instead t 0 may, 
for example, be used to denote a value of t somewhere in the middle of the 
interval. A difference table in central-difference notation with t 0 denoting 
the middle value has the form 


t-2 

t-1 

to 

tl 

t 2 


f-2 

/-I 

fo 

h 

h 


<5/-3/2 
V- 1/2 

<5/l/2 

<%/2 



< 5 3 /-1/2 

<5 3 /i/2 


Referring to the numerical table on page 30, if t 0 = 30 then <5/_ 3/2 = 522, 
<5 2 /i = 70, etc. 


4.1.2. Forward differences 

In forward-difference notation we denote the first difference f(ti)-f(t 0 ) 
by A/ 0 , the first difference /(t 2 ) —/(ti) by A/i, the second difference 
(Aff—Af 0 ) by A 2 / 0 , etc., so that (in this notation) the above difference 
table would have the form 


to 

h 

t 2 

h 

U 


fo 

ft 

h 

h 

u 


A/o 
A/i 
A f 2 
A/3 



A 3 /o 

A 3 /i 


As for central differences it is not necessary to use t 0 at 


the top of the table. 
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4.13. Backward differences 

In backward-difference notation we denote the first difference f(h)— /(to) 
by V/i, the first difference f(t 2 )-f(ti) by V/ 2 , the second difference 
(V/ 2 —V/i) by V 2 / 2 , etc., so that (in this notation) the above difference 
table would have the form 


*0 

h 

t 2 

h 

h 


fo 

fl 

fz 

h 

u 


V/i 

V/2 

V/3 

V/4 




As for central and forward differences it is not necessary to use t 0 at the 
top of the table. 


4.1.4. The relationship between the different notations 

It is important to note that, despite the differences in notation, in any 
particular example it is the same numbers which will appear in the same 
positions in all three tables; that is, <5/ 1/2 , A/ 0 , V/i are simply three different 
ways of denoting the same number. Similarly 3 2 f v A 2 / 0 , V 2 / 2 represent 
the same number, and so on. 

From the example on page 32, we see that if x 0 = 0*3, then 

Sfi /2 = A/ 0 = V/i = -0*923, 

Sf- 5/2 = A/_3 = V/- 2 = -0479, 

<5 2 /- 2 = A 2 /-3 = V 2 /-! = -0*154, 

<5 2 /i = A 2 / 0 = V 2 / 2 = -0*136, 

etc. 


EXERCISE 

Evaluate f(x) = 1/x correct to 4 decimal places for x = 1(1)10 and construct the 
difference table as far as fourth differences. 

With x 0 = 5, what are the values of 5f _ 1/2 , 3 2 / 3 , <5 4 /_ 2 , A/ 2 , A% A 3 /_ 3 , V/ 3 , V 3 /_ x , 

vy 4 ? 

4.2. Propagation of round-off errors in difference tables 

Suppose now that our function values are not known exactly but are 
subject to round-off errors. If the rounding is to 2 decimal places, the 
maximum numerical error in each function value is 0*005. Then the 
maximum numerical error in each first difference is 0*01, that is, one unit 
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in the second decimal place. The maximum numerical error in each second 
difference is 0*02 or two units in the second decimal place. The maximum 
numerical error in each third difference is 0*04 or 4 units in the second 
decimal place and so on. Similar results, of course, hold when the function 
values are rounded to k decimal places for any integer k. 

In the following difference table the function values have been obtained 
correct to two decimal places: 


X f(x) 

1 -41*23 

2 -39*89 

3 -34*97 

4 -26*49 

5 -14*43 

6 1*18 

7 20*37 

8 43*15 


134 

492 

848 

1206 

1561 

1919 

2278 


358 

356 

358 

355 

358 

359 


-2 

2 

-3 

3 

1 


The third differences here are all approximately zero to within the 
accuracy (± 0*04) permitted by round-off errors in the function values. The 
second differences are all approximately constant (namely 3-57) to the 
accuracy (±0-02) permitted by round-off errors in the function values. 

4.3. Differences of polynomials 

We begin this section by constructing the difference table for the poly¬ 
nomial f(x) = x 2 + 2x-1 for (exact) values of x = 1*00(0*2)1*10. 


X 

fix) 



1*00 

2 

804 

8 

1*02 

2*08 - 

o 1 o 

1*04 

2*1616 

olZ 

son 

8 

1*06 

2*2436 

o ZU 
SOS 

8 

1*08 

2*3264 

oZo 
s oz 

8 

M0 

2*4100 

oOO 



The important point to note is that for this second-degree polynomial 
the second differences are constant. 

We recall too that in the example on page 31 we saw that for the 
third-degree polynomial considered there the third differences were 
constant. 

These two examples are illustrations of the general theorem which states 
that the nth differences of the nth degree polynomial 

a n x n + a n ~ iX n 1 + * * * + a^x + ao (a„ # 0) 
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are ti\a n h n where h is the difference between successive values of x at which 
the polynomial is evaluated. The proof of this theorem is by induction 
and is given in Appendix 2. 

It is important to point out that this theorem applies only when the 
function values are exact. If round-off errors are present, the result will 
not generally hold. However, it will be true for the general nth degree 
polynomial that its nth differences will have the constant value n\a„h n to 
within the limits imposed by round-off errors (see § 4.2). 

For example, consider the following difference table for the cubic 
x 3 — 8x + 5. The values of the cubic are rounded to two decimal places. 


X 

/M 



2-0 

2-2 

-3-00 
—1*95 

105 

157 

216 

277 

345 

417 

493 

52 

2-4 

-0-38 

59 

2-6 

1-78 

61 

2-8 

4-55 

68 

3-0 

8-00 

72 

3-2 

3-4 

12-17 

17-10 

76 


7 

2 

7 

4 

4 


Clearly the third differences here do not have the constant value 
3 !(*2) 3 = 0*048. However they are all in fact approximately equal to this 
value to within the limits (± 0*04) imposed by the possible round-off errors 
in the polynomial values. 


EXERCISE 

Verify the result of the theorem of this section for the following cases: 

(i) the polynomial 2x 2 — x+ 1 and x = 0(1)5 

(ii) the polynomial 3x 3 + x 2 — 4x — 5 and x = — 2(0*5)2 

(iii) the polynomial -x 4 +x 2 + 3x +1 for x = 1(0*1)15. 


4.4 Fitting a polynomial to given function values 

The process to be considered in this section is best illustrated by an 
example. Given a set of values of a function f(x) corresponding to a set 
of equally spaced values of x, determine the polynomial of lowest degree 
which fits them. 

x 0 0-5 10 1-5 2*0 2-5 3-0 

/(x) 0-25 -1-50 -1*75 -0-50 2*25 6*50 12-25 
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The first step is to construct the difference table shown below, 
x f(x) 


0-5 -1-50 150 

10 -1-75 150 

1- 5 -0-50 tr n , 150 

2- 0 2-25 150 

2- 5 6-50 ft, 150 

3- 0 12-25 


Since the second differences are constant, the required polynomial must 
have degree two. Further, we must have 2 !a 2 (0‘5) 2 = 1-50 where a 2 is the 
coefficient of x 2 in the required polynomial. Therefore a 2 = 3. 

Now form the difference table for the function g(x) = f(x)—3x 2 . This 
function must be able to be fitted at the given points by a polynomial of 
first degree, and so its first differences should be constant. 


X 

g(x) 

0 

0*25 

0*5 

-2*25 

10 

-4*75 

1*5 

-7*25 

20 

-9*75 

2*5 

-12*25 

30 

-14*75 


-250 

-250 

-250 

-250 

-250 

-250 


Then 1 la,(0-5) = —2-50 where a t is the coefficient of x in the required 
polynomial. Therefore a 1 — — 5. 

Now the values of/(x)—3x 2 + 5x are all equal to 0-25, so that the 
required polynomial is 3x 2 —5x+0-25. 

In this case the given function values could be fitted exactly by the 
quadratic. The method can, however, be extended to the case in which 
the given data values can be fitted only approximately by a polynomial 
of low degree. An alternative method for finding a polynomial fit to given 
data is given in the next chapter in § 5.3. 

It must be emphasized most strongly that, once a polynomial has been 
obtained which fits the values of a function for a given set of values of 
the independent variable, it cannot be assumed (without more infor¬ 
mation) that it will also fit the function at other values of the independent 
variable. This fact is also best illustrated by an example. The function 
cos 2nx evaluated at x = 0,1, 2,... has the constant value 1. Hence, if we 
are given as data the values of this function at these values of x, they can 
be fitted by the polynomial p(x) = 1. Clearly the polynomial does not fit 
the values of cos 2nx at x = j, f,... The problem of determining when 
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a polynomial which fits given function values for specific values of the 
independent variable also fits the function values at intermediate values 
of the independent variable is an interesting one which is unfortunately 
beyond the scope of this book. 


EXERCISE 

Evaluate the function /(x) = 3x 2 — 5x+0*25 + 6 sin27rx for x = 0(0*5)3*0 and 
determine the polynomial which fits these function values. Comment on your 
answer. 


4.5. Relationship between differences and derivatives 

The derivative of a function / at xo can be expressed as 


fix 0 ) = Jim 

h -*-0 


f(x 0 + jh)-f(x 0 -jh) 

h 


lim-<5/ 0 . 

h^oh 


Hence, for sufficiently small h, we have 

/'(Xo) - ^<5/o- 

However, the values f(xo+%h) and /(x 0 — jh) are not tabulated in a 
difference table but instead we have the values /(x 0 ) and f(x 0 + h). These 
values can be used to give an approximation to f'(x 0 +jh). We have 

f'{xo+jh) — - <5 /i/2 = -^{/(xo+h)—f(x 0 )} 


or 


Now 


fill — ^ <5/i/2 • 

f „, v .. /'(xo + ?h) - f'(x 0 - ?h) 

f (Xo) = 158 - 1 - 


Hence, for sufficiently small values of h 9 


/■(*«)=* jj (/'<*»+- ruo -m 


-p 4 *- 
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These results are special cases of the general result, which can be proved 
by induction, that 

in which f& n) denotes the nth derivative of the function / evaluated at x 0 . 

It is interesting to note how the central result of this section ties up with 
the central result of §4.3. In the latter section it was stated that 
the nth differences of an nth degree polynomial in which the coefficient of 
x” is a n have the constant value n\a n h n . Now, since for positive integers 
m < n (and for m = 0) the nth derivative of x m is zero and the nth 
derivative of x" is n!, we see that the nth derivative of the nth degree 
polynomial p n (x) = a n x n + a n ~ \X n ~ 1 + * * • + aix + a 0 isn la n . Hence we have 

Pn'(x) = ~<5>„(X). 

The above approximate formulae are examples of numerical differ¬ 
entiation. Numerical differentiation is not, however, a very satisfactory 
process. To illustrate this we will consider the process for the first 
derivative. 

We have 

1 v-r f(x 0 +jh)-f(x 0 -ih) 
f ll2 C -Sf ll2 - - 

Now the truncation error of this formula is decreased by taking smaller 
and smaller values of h . However, decreasing h will have the effect of 
making the function values f(x 0 +ty) and f(x 0 -^h) more nearly equal, 
so that we are faced with the problem of loss of significant figures 
associated with subtracting two nearly equal numbers. 

Example . Obtain a numerical approximation to the second derivative 
of the logarithmic function at 5 using four-figure logarithm tables and 
(i) h = 0-5, (ii) h = 0-25, (iii) h = 0T25. (To three significant figures the 
answer is —0-0174.) 

We shall use the result 

/"(*o) - 

(i) We require the second central difference of the logarithmic function at 
5. To obtain this with h = 05 we require only the values of the logarithm 
at 4-5, 5, 5-5. 
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Hence 


/"( 5 ) * ; 


logx 

0-6532 

0-6990 -44 

0-7404 

(-0-0044)= -0-0176. 


(ii) As above, we require the values of the logarithm at 4*75, 5, 5*25. 


x 

4*75 

5*00 

5*25 


log* 

0*6767 

0*6990 

0*7202 


223 

212 


-11 


Hence /"(5) =* 0-0011) = -0-0176. 

Thus decreasing h has not changed the accuracy of the result. 

(iii) As above, we require the values of the logarithm at 4*875, 5, 5*125. 

x logx 

4*875 0*6879 

5000 0*6990 

5*125 0*7097 

Hence /"(5) =* (- 0-0004) = - 0-0256. 

In this case, decreasing h has resulted in a much poorer answer. 




Interpolation 



SUPPOSE WE ARE GIVEN THE VALUES OF A FUNCTION / CORRESPONDING 
to several values x 0 , Xi, X 2 , • • •, x n of the independent variable x. Inter¬ 
polation is then the process by which we obtain, from these function values, 
approximations to the function values for values of the independent 
variable which are intermediate to those given above, say for example, 
at a value of x between x 4 and x 5 . 

For example, we may be given the following information 


X 

f{x) = log X 

80 

1*9031 

81 

1*9085 

82 

1*9138 

83 

1*9191 

84 

1*9243 

85 

1*9294 


and asked to determine approximations to log 80*5, log 82*75 or log 84*8 
etc., or given the information 

x f(x) = e x 
0*33 1*3910 

0*34 1*4049 

and asked to estimate e °‘ 333 . 

One way of proceeding is to attempt to draw the graph of the curve 
represented by y = /(x). We can then read off the value of the function 
/ corresponding to a particular value of x. This is not as easy as it may 
appear, however, especially if we are simply given several function values 
and no further information about the function. The problem is seen to be 
even more difficult when we remember that the function values themselves 
are almost certainly not exact. They have most likely been rounded off 
to a given accuracy, even if they are not subject to other errors such as 
experimental errors. 
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For example, if we are simply given the data 

x fix) 


Xo fix o) 

Xi fix i) 

x 2 fix 2 ) 

without any further information on the function /, we can readily plot 
the points (x 0 , /(x 0 )), (*i, fix i)) and (x 2 , /(x 2 )) on graph paper as in 
Figure la. However we have no way of telling whether the graph of 
y = /(x) looks like that shown in Figure lb or like that in Figure lc or 
indeed is something which looks different from either of those. 





Clearly the approximations obtained for the function values at x = x#, 
and x = xg 2 from Figure lb are very different from the approximations 
obtained from Figure lc. 

5.1. Linear interpolation— 

Newton’s forward-difference formula of degree one 

Suppose we are given the function values Y 0 = fix o), Yi = /(xi)(xi > x 0 ) 
and that we desire to obtain an approximation to /(x 0 + k) where 
x 0 < Xo + fc < Xi. Clearly if we knew the shape of the curve y = /(x) 
between x = x 0 and x = xi we could determine /(x 0 + k) very accurately. 
However, if the shape is not known, we can take as a first approximation 
a straight line, that is, we can assume that the curve is approximately linear 
between the points A(x 0 , Y 0 ) and B(xi, Yi) (Figure 2). Then an approxi¬ 
mation to Y k = fixo + k) can be obtained from the graph (Figure 2). It is 
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not necessary, however, to use the graph to determine the approximation 
to /(x 0 + fc); instead we can determine it analytically as follows. 

The gradient of the straight line AB is 

/(Xi)-/(Xq) 

Xl~Xo 

Hence an equation for the straight line AB is given by 


or 


y-f(x o) 


f{xi)-f{x 0 ) 

Xi-Xo 


(.X-Xo) 


y 


= f(x o) + 



(/(*i)-/(*o))- 


Then when x = x 0 + k the corresponding value of y (that is, the approxi¬ 
mation y k/h to /(xo + fc)) i s given by 

y’kih = f(x 0 )+—-— (f(xi)-f(x 0 )) 

Xi-Xo 

= f(xo) + k h (f(x,)-f(xo)) 

, k a. r 

where Xi — x 0 = h. 

If now the fraction k/h is denoted by 6 , we have 

yk/h ~ ye = (1 ~ 6)f(xo) + @f(xi) =/ o + 0A/ o . 

In most applications 6 will be between 0 and 1. 

This method of determining an approximation to f(x 0 + k) by assuming 
that the curve y = f(x) is linear between (x 0 , /(x 0 )) and (xi, f(x i)) is 
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Called linear interpolation . The approximate formula 

f(x o + ()h) ~f 0 +9Af 0 

is called Newton’s forward-difference interpolation formula of degree one. 

Example. From the data given on page 40 determine approximations 
to log 80*5, log 82*75 and log 84*8 using linear interpolation. 


X 

logx 

80 

1*9031 

81 

1*9085 

82 

1*9138 

83 

1*9191 

84 

1*9243 

85 

1*9294 


1st differences 

54 

53 

53 

52 

51 


Using Newton’s forward-difference formula we have 

log 80-5 ~ 1-9031+ Y (00054) 

= 1-9058. 

(From 4-figure tables, log 80*5 = 1*9058.) 

log 82*75 - 1*9138 + 0*75 (0*0053) 

= 1*9178 (to 4 decimal places). 

(From 4-figure tables, log 82*75 = 1*9178.) 

log 84*8 - 1*9243 + 0*8 (0*0051) 

= 1*9284 (to 4 decimal places). 

(From 4-figure tables, log 84*8 = 1*9284.) 

Example. From the data given on page 40, determine an approximation 
to e° 333 using linear interpolation. 

x e x 1 st differences 

0*33 1*3910 nq 

0*34 1*4049 

Then e 0 ' 333 ~ 1-3910+^^(0-0139) = 1-3952 (to 4 decimal places). 

(From tables e°‘ 333 = 1*3951 to 4 decimal places.) 

We must now discuss the errors in the application of linear inter¬ 
polation. We shall not consider “mistakes” but shall look at the effect on 
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the interpolated value of round-off (or other) errors in the given data and 
the error of the method itself (truncation error). 

5.1.1. Effect of round-off in the given data 
Let the given data be 


x fix) 


x 0 yo 

*1 yi 

We shall assume that y 0 and yi are subject to round-off errors e 0 and e x 
respectively, so that the corresponding exact (but unknown) values Y 0 and 
Yi are given by 

Yo = yo + e 0 

and Yi=yx + ei. 

Now the approximation ye to f(xo + 9h) which we calculate using linear 
interpolation is given by 

ye = (l-0)y o + 0yi. 

However, what we really want is Y e where 

Y e = (l-6)Y 0 + 9Yi. 

(Remember that we cannot actually calculate Y e as Y 0 and Y x are unknown, 
so the best we can do is to calculate y e .) 

Hence the error e e in our approximation, due to the round-off errors 
in the given data, is given by 

e 0 — Ye — ye 

= (l-9)(Y 0 -y 0 ) + 9(Yi-yi) 

= (l-9)eo + 9ei. 

Hence \e e \ ^ (1 — 9)\e 0 \ + 0|ei| 

since (1 — 9) and 9 are both positive for 0 < 9 < 1. 

Then, if both \e 0 \ and \ei \ are less than or equal to e(> 0), 

\e e \ ^ e. 

For example, if both y 0 and yi are rounded to k decimal places then 
|£? 0 1 and \ex | are each less than or equal to j10~ k and so \e e \ will also be 
less than or equal to il0 -fc ; that is, in the absence of any truncation error 
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of the method, the interpolated value will also be correct to k decimal 
places. 

5.1.2. Truncation error of the method 

This error is due to the fact that we have approximated to the curve 
between the points (x 0 , y 0 ) ar >d (xi, Vi) by a straight line when in fact the 
curve has some other shape in that interval. It will in general be rather 
difficult to estimate this error accurately for given data. 

Here we start from the data 

X fix) 

x 0 /(* o) = Y o 

X, fix,) = y, 

that is, we assume that the function values Y 0 and Y, are known exactly. 
The approximation to fix o + 0h) which we obtain using linear inter¬ 
polation is then 

il-d)Y 0 + 6Y 1 . 

Hence the truncation error is 

fix o + 6h)-(l-d)Y 0 -6Y l = fix o + Oh) - (1 - 0)/(x o ) - 0/(*i)- 
But from Taylor’s series expansion* we obtain 

fix 0 + 9h) = fixo)+9hf'ixo)+U0h) 2 f"ix o ) + -- • 
and fix,) = fixo + h) 

= fixo)+hf'ixo)+jh 2 f"ixo) + - • ■ 

Therefore the truncation error is 

/(x 0 )+ ehf'ixo)+m)T'(xo)+■■■ 

-fix 0 ) + Ofix 0 ) 

- Gfix o) - ehf'ixo) - kehTixo) +... 

= ±6h 2 i0-l)f"ix o )+---. 

The first term in the series, namely %6h 2 i6 — l)/"(x 0 ), is usually the most 
important one and is called the principal truncation error. When h is small 
(compared with one) the principal truncation error will normally give a 
satisfactory estimate of the total truncation error. 


See Appendix 1. 
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It must be emphasized, however, that evaluating /"(x 0 ) is often very 
difficult and unsatisfactory (especially when the function itself is not 
known but only the function values). 

EXERCISES 

1. From the data given on page 40, determine approximations to log 83*5, log 81-33 
and log 84-25 using linear interpolation. 

2. Given that, correct to four significant figures, ^/150 = 5-313 and ^160 = 5-429, 
obtain using linear interpolation an approximation to ^153-8. (To four signifi¬ 
cant figures ^/153-8 = 5-358.) 

3. Given that, to four significant figures, e 1 ' 15 = 5-755 and e 1 ' 80 = 6*050 obtain, by 
linear interpolation an estimate of e 1 ’ 784 (To four significant figures e 1 ’ 784 = 
5-954.) 

4. From the values of 1/x 2 correct to 3 significant figures when x = 2 and x = 3 
obtain an estimate of l/(2-54) 2 using linear interpolation. (Correct to three 
significant figures l/(2-54) 2 = 0-155.) 

5.2. Quadratic interpolation— 

Newton’s forward-difference formula of degree two 

Instead of joining two neighbouring points with a straight line and using 
this for interpolation at intermediate points as in the previous section, we 
can join three neighbouring points with a quadratic and use this curve 
for interpolating. This method is known as quadratic interpolation . 

To begin with we must be given at least three values of x, say x 0 , xi, x 2 
and their corresponding function values /(x 0 ), /(xi), /(x 2 ). The first step 
is then to determine an equation for the quadratic curve passing through 
the three points (x 0 ,/(x 0 )), (xi,/(xi)), (x 2 ,/(x 2 )). This could be done 
simply by substituting in turn the coordinates of these points in an 
equation of the form y = a 0 + «ix + a 2 x 2 and solving the resulting 
simultaneous equations 

a 0 + a 1 Xo + a 2 Xo = f{x 0 ) 
a 0 + aiXi +a 2 x? = f(x i) 
a 0 + tfix 2 + a 2 x 2 = /fe) 

for a 0 , a x and a 2 . However, the amount of work involved can be con¬ 
siderably reduced if we start with the quadratic equation in the form 

y = A 0 + Ai(x-xo) + A 2 (x-xo)(x-xi). 

(Note that the right-hand side of this equation is second degree in x and 
can be written in the form a 0 + uix + a 2 x 2 .) 


QUADRATIC INTERPOLATION 

Now since the curve must pass through the point (x 0 , f(x o)) 

f(x o) = Aq. 

Since the curve also passes through the point (xi, /(xi)) 

/(Xj) = y4 0 + ^l(Xl-X 0 ) 

= f{x 0 ) + A 1 h 

where x t — x 0 + h. 
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Hence 


, /(xi)-/(x 0 ) l Kf 

A, - - h -= jA/o. 


Since the curve also passes through the point (x 2 , f(xi)) 

/(x 2 ) = ;4o + ^4l(*2 — Xo) + ^ 2 (x 2 “ Xo)(x 2 — Xi) 
— f(xo)+jAfo{2h) + A 2 {2h)(h) 

where x 2 = x\ + h — Xo 4- 2 h. 


Hence 


Al = 2 {/fe)-/(^o)-Wo} 


- 2 ?^- 


Therefore, we have 


y , x_x o A/ - , (x-x 0 )(x-xx) 2 

y = /o + —A/o +- — 2 - A/o. 


Now, when x = x 0 + Oh the corresponding value of y (that is the approxi¬ 
mation y e to /(x 0 + <>h)) is given by 

ye = fo + 0Af o +j6(0—l)A 2 fo- 

If possible x 0 should be chosen so that 0 lies between 0 and 1. The 
approximate formula 

/(xo + Oh) ~ f 0 + OAf 0 + mO - l)A 2 / 0 

is called Newton’s forward-difference interpolation formula of degree two. 
Expressions for round-off errors and truncation errors can be obtained in 
the same way as in the previous sections. 
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Example. Use Newton’s forward-difference interpolation formula of 
degree two and the values of cos 30°, cos 60° and cos 90° to obtain an 
estimate of the value of cos 50°. 

x° cos* A(cosx) A 2 (cosx) 

30 0866 _ ^ 

60 0-500 -134 

90 0-000 

Here h = 30° and 9 = §§ = §. 

Hence cos 50° - 0-866+f( - 0-366) +H(-j)(- 0*134) - 0-637. 

From tables the value of cos 50° is 0-643. 

Note that, using linear interpolation, we obtain 

cos 50° - 0-866+f(-0-366) = 0-622. 

Thus, in this case, quadratic interpolation has produced the better 
approximation. 


EXERCISES 

1. Given that, correct to four significant figures ^150 = 5-313, ^/160 = 5-429 and 
^170 = 5-540, obtain using quadratic interpolation an approximation to ^ 153-8. 
Compare the approximation with that obtained in exercise 2 on page 46 and 
with the true value, correct to four significant figures, 5-358. 

2. Given that, to four significant figures, e 1 ' 10 = 5-474, e 1 75 = 5*755, e 1 80 = 6*050 
and e 1 ’ 85 = 6-360, obtain using quadratic interpolation estimates of e 1 ' 784 and 
e 1 * 725 . Compare the estimate of e 1 784 obtained here with that obtained in exercise 
3 on page 46 and with the true value, correct to four significant figures, 5-954. 

3. From the values of 1/x 2 correct to three decimal places, when x = 2, 3, 4, obtain 
an estimate of l/(2-54) 2 using quadratic interpolation. Compare this approxi¬ 
mation with that obtained in exercise 4 on page 46 and with the true value, 
correct to three significant figures, 0-155. 

5.3. Higher-order formulae 

In a similar way we can join four neighbouring points with a cubic, five 
neighbouring points with a quartic, and so on, and then use these curves 
for interpolating at intermediate points; that is, we can develop cubic and 
quartic interpolation formulae and so on. Newton’s forward-difference 
formula of degree n is 

y» — /o + eAfo+m -1) A 2 /o + ... 0(0 -1)... (0 - n 4- l)A"/o. 
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It does not follow, however, that using a higher-order interpolating 
formula will necessarily give more accurate interpolated values. Whether 
it does or not depends upon the shape of the curve on which the given 
data should lie. One way of obtaining an indication of the degree of 
interpolating polynomial which should be used in a particular situation is 
by considering the differences of the given function values as in §4.4. 

Example . Given the data 


x f(x) 

1-0 2-98 

1-2 3*55 

14 4-45 

1-6 5*63 

1*8 716 

20 901 

use interpolation to obtain an approximation to the value of the function 
/at 1-5. 

In order to obtain an indication of which degree of interpolating 
polynomial should be used, we first form a difference table for the given 
data. 

x / 

10 2-98 

1-2 3*55 

14 445 

1-6 5*63 

1*8 7*16 

2*0 9*01 

From this we see that the second differences are approximately constant. 
(The third differences are alternately positive and negative and the 
magnitudes of the higher-order differences increase.) Thus the given data 
are most closely approximated by a second degree polynomial and so a 
second degree interpolating polynomial should be used. 

Newton’s forward-difference interpolating polynomial of degree two is 

/o + 0A/ o +i0(0-l)A 2 / o . 

Hence, using this polynomial for interpolating with f 0 = /(14) and 9 = j 
we obtain as our approximation for /(1-5), 

445 + i(lT8)+i(i)(-i)(0*35) ~ 5*00 

(rounded to two decimal places). 


57 

90 

118 

153 

185 


33 

28 

35 

32 
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Example. By differencing the following table, find the degree of the 
polynomial of minimum degree which will exactly fit the data 

x 0 05 10 1*5 20 2*5 30 

f(x) 0*25 —1*50 -1-75 -050 2-25 6*50 12-25 

and obtain this polynomial using Newton’s forward-difference formula of 
appropriate degree. 


* /(*) 


0 

0-25 

0-5 

-1-50 

1-0 

-1-75 

1-5 

-0-50 

2-0 

2-25 

2-5 

6-50 

3-0 

12-25 


-175 

-25 

125 

275 

425 

575 


150 

150 

150 

150 

150 


Since the second differences are constant, the required polynomial must 
have degree two. This polynomial can now be determined by obtaining 
Newton’s forward-difference interpolating polynomial of degree two for 
the interval starting at 0. The general expression is 

fo + 0A/ O +^0(0 — l)A 2 / 0 
from which we obtain here 

since 0 — x/h and h = 0-5. On simplification this gives the polynomial 

3x 2 — 5x + 0\25. 

Note that because the polynomial exactly fits all the data the same 
result is obtained using Newton’s formula for any of the given intervals 
(provided of course the appropriate first and second differences are 
known). Care must be taken in the interpretation of 0 however. For 
example, for the interval starting at 1*5, 


0-5 
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and so we obtain the polynomial 
x —1*5 


-050 + - 


0-5 


■ (2*75) + i 


x—1*5\ (x — 1*5 


0*5 


0*5 


-1 (1*50). 


On simplification this again reduces to the polynomial 3x 2 —5x + 0*25. 
This example should be compared with that in §4.4 on page 36. 


EXERCISES 

1. Given the data 

x 1*0 1*1 1*2 1*3 14 1*5 

/(x) 8*01 9*69 11*56 13*61 15*84 18*26 

use an interpolation formula of appropriate degree to obtain an approximation 
to the value of the function / at 1*36. 

2. Given the data 

x 0 *2 *4 *6 *8 1*0 

f(x) 0*782 1*033 1*283 1*534 1*784 2*035 

use an interpolation formula of appropriate degree to obtain an approximation 
to the value of the function / at 0*452. 

3. By differencing the following table, find the degree of the polynomial of minimum 
degree which will exactly fit the data 

x 0 *5 1*0 1*5 2*0 2*5 3*0 

/(x) 0*75 -0*25 -0*25 0*75 2*75 5*75 9*75 

and obtain this polynomial using Newton’s forward-difference formula of 
appropriate degree. 

4. By differencing the following table, find the degree of the polynomial of minimum 
degree which will exactly fit the data 

x 1*0 1*2 1*4 1*6 1*8 2*0 

/(x) 2*25 1*73 0*97 -0*03 -1*27 -2*75 

and obtain this polynomial using Newton’s forward-difference formula of 
appropriate degree. 
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IN THIS CHAPTER WE WILL BE CONCERNED WITH THE DETERMINATION OF 

approximate numerical methods of integration. These methods are 
especially useful in the following three situations: 

1. The function to b e int egrated is such that no analytical method 
exists. For example JJ-^/sinx dx. 

2. An analytical method exists but is rather complex. For example 



(see Chapter 1, page 2). 

3. The function to be integrated is not known explicitly but instead we 
are given only a series of function values for values of the independent 
variable in the interval of integration [a, b]. 

For cases 1 and 2 the first step is to evaluate the function /, which is to 
be integrated, for a series of values of the independent variable covering 
the interval of integration [a, b]; that is, cases 1 and 2 are first reduced to 
case 3. 

Now the series of values of the independent variable x and the corre¬ 
sponding values of the function / can be plotted on graph paper as in 
Figure 3. 



Then, if the shape of the curve y = /(x) is known, it can be drawn 
(Figure 4) and the area under this curve between x = a and x = b (that is, 
the integral of /(x) with respect to x from x = a to x = b) estimated. 
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One way of estimating this area is to count the number of squares of 
the graph paper which are included in the area. An allowance has of course 
to be made for all the incomplete squares included (Figure 4). An upper 
bound to the integral is obtained by counting each incomplete square as 
if it were a complete square, and a lower bound is obtained by simply 
neglecting any incomplete squares. 

Very often (as in Chapter 5) we know nothing about the shape of the 
curve represented by y = /(x) in the interval from x = a to x = b, and so 
we start by making some assumption as to its shape. The easiest assump¬ 
tion to make is that the curve can be approximated by a straight line 
between any two consecutive points (Figure 5a). Alternatively we may 
assume that the points, taken in threes, lie approximately on a parabola- 
type curve (that is, a curve with an equation of the second degree) which 
may be different for each set of three points as in Figure 5 b and so on. 



The different approximations we make as to the shape of the curve will 
lead to different numerical integration formulae and hence to various 
approximations to the integral. 
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6.1. The trapezoidal rule 

It is required to obtain an approximation to \ b f(x)dx where the 
values of the function / are known for a sequence of equally spaced 
values of x between x = a and x = b. Denote the values of x by x r 
(r = 0,1,2,..., n) where x 0 = a, x r = x 0 + rh , x n — x 0 + nh = b and h is a 
constant, and denote the corresponding function values by f r ; that is 
f r =/(* r ) =/(*o + rh ) ( see Figure 6). 



Since we do not know the shape of the graph of y = f(x) we shall use 
as a first approximation the curve obtained by joining each consecutive 
pair of points (x r ,/ r ) and (x r+ *i, / r +i) for r — 0,1,2,..., (n — 1) by a straight 
line (that is, a curve of degree one) as in Figure 6. 

Now the equation of the straight line joining the points (x 0 , fo) and 
(xi, /i) is 

y = /o + (x—x 0 ) (———^. 

\Xi. x 0 / 

Then with this approximation to /(x) in the interval from x 0 to xi, we 
see that 


f 


f(x)dx ~ the area of the trapezium ABCD (see Figure 6) 


fo + (x — Xo) 


fi-fo 


dx 


= /o(xi-x 0 )+i(xi-x 0 ) 

= ?h(f 0 +fi). 


Xi — Xo 

2 I fl—fo 


Xi -Xo 
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Similarly 


j: 


f(x)dx ~ the area of the trapezium BEFC (see Figure 6) 


= jHfi+fi)- 

Hence, summing over all the trapezia between x = a and x = b, we see that 


I 


f(x)dx = 


f(x)dx 


f(x)dx 


f xi fx 2 r x„ 

= f(x)dx+ I f(x)dx + - •• + I 
J Xo Jxi Jx n -i 

— jh(fo+ + -1-/2) + ' * '+iM/n-l + /n) 

= i^(/o + 2/i + 2/2 + ‘ * * + 2/„-i +^). 


This is the trapezoidal rule and is of course exact if the curve represented 
by y = f(x) is a straight line or a series of straight lines as in Figure 6. 

Example. Use the trapezoidal rule to obtain from the data contained 
in the following table an approximation to J'*/ (x)dx. 

x f(x) 


2-0 1-7321 

2 - 5 1-8708 

3 - 0 2-0000 

3 - 5 2-1213 

4 - 0 2-2361 

Here h — 0-5 and so from the trapezoidal rule we obtain 

j*V(x)dx 0-5(1-7321+ 2 X 1-8708 + 2 x 2-0000 + 2 x 2-1213+ 2-2361) 

= 0-25(15-9524) 

= 3-9881. 

This result is of very limited value as it stands, since we do not know how 
good an approximation it is to the true value of the integral. How large is 
the error in the result due to the round-off errors in the original data? How 
large is the truncation error in this case? 


We must now investigate the errors involved in applying the trapezoidal 
rule. Once again we shall not consider “mistakes” but shall only look at 
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the error due to round-off (or other) errors in the given data and the 
truncation error of the method (i.e. the error due to approximating the 
curve by the straight lines). 

6.1.1. Effect of round-off on the trapezoidal rule 

Let the error in f r be e r for r = 0, 1, 2, , n. Then the modulus of the 

error in the integral due to these errors in the given data is less than or 
equal to jh(\e 0 \ + 2\ei\+2\e 2 \ + - - m + 2\e„-i\ + \e n \). Now if each of the 
absolute errors \e r \, r = 0, 1, 2, ..., n, is less than or equal to \e\, as for 
example when they are entirely due to rounding the given data to a fixed 
number of decimal places, the modulus of the error in the integral is less 
than or equal to 

±h(\e\ + 2\e\ + --- + 2\e\ + \e\) = nh\e\ 

= (b~a)\e\. 

Hence, if the errors in the function values / r , r = 0,1, 2,..., n are entirely 
due to rounding-off to k decimal places, then \e\ ^ il0 _fc and so the 
modulus of the corresponding error in the integral is less than or equal to 
%(b — a) 10 -fc . Note that this bound on the error is independent of n and h. 

Example. Estimate the error in the result of the last example due to 
possible round-off errors in the given function values. 

Since the function values are given to four decimal places, the maximum 
absolute error in each is ^1CT 4 . 

Hence the modulus of the error in the integral value, due to this source, 
is less than or equal to (4 — 2)^10" 4 = 10“ 4 = 0-0001. 

Thus, even if there were no truncation error, the last digit in the value 
3-9881 could not be regarded as accurate. 

6.1.2. Truncation error in the trapezoidal rule 

As for interpolation, the truncation error is more difficult to estimate 
than the round-off error. However, if the function / is sufficiently 
differentiable it can be shown, using Taylor series, that the modulus of 
the truncation error for the integral of / from a to b is less than or equal to 

-hh 2 (b-a)\f"(£)\ 

where |/"(£)l = max {\f"(x)\; a^x^b}. 

Further, if it is known that /" is always positive in the interval from a to 
b, then the truncation error will be negative, while if /" is always negative 
then the truncation error will be positive. 
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Clearly, the smaller the value of h, then the smaller will be the modulus 
of the maximum truncation error. 

Example . Given that the function f(x) whose values are given in the 
example on page 55 is (1 +x) 1/2 , estimate the truncation error in the value 
of the integral obtained in that example. 

Denote the truncation error by E. Then in the notation used above, 

\E\<±h 2 (b-a)\f'W\ 

= 

Here f(x) = (l + x) 1/2 . 


/'(*) = *(l+*r 1/2 and f"(x) = -1(1+ x)~ 3/2 = 4 ^ (1 | x) 3 • 

Clearly then /"(x) has its maximum numerical value in the interval 
2 < x < 4 when x = 2; that is £ = 2 and so 

'™~w = wy 

Hence 1*1 S A(°-5 ) s 4(-L) a 0002. 

But, from the form of /"(x), it is obvious that /"(x) is always negative for 
2 ^ x < 4 so that the truncation error E will be positive. 

Thus 0 ^ E ^ 0-002. 

Comparing this with the round-off error for the same problem (obtained 
in the previous example) we see that in this case the truncation error is 
dominant (and so the round-off error may be neglected). 

Hence we see that any digits after the third decimal place may be 
meaningless in the value obtained for the integral. 


Therefore 


3-988 < 


% 

(l+x) 1/2 dx ^ 3-988 + 0-002 = 3-990. 

2 


*4 

J 2 


(1 +x) 1/2 dx = 3-99 to two decimal places. 


Therefore 
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From ordinary (non-numerical) integration, we have 

\l+x) 1/2 dx = [f(l+x) 3/2 ]* 

= |(V125-V27) 

= 3-989 

correct to three decimal places. 

Now it must be emphasized that it will not always be possible to carry 
out an analysis similar to that of this last example. In very many cases it 
may be no easy task to determine /"(x) and then, even when this can be 
done, there still remains the problem of determining that is, the value 
of x in the interval of integration for which /"(x) has its maximum value. 
The procedure which is normally adopted when the truncation error 
cannot be readily estimated is illustrated in the following sub-section. 

It should also be pointed out that the estimates we have obtained for the 
errors are all maximum values, and the actual errors occurring in any 
particular case may in fact be much less than these estimated values. Thus, 
while the expression obtained above for the maximum truncation error is 
of considerable theoretical importance, its practical value is very limited 
indeed. 

6.2.3. Application of the trapezoidal rule 

We shall consider separately the cases when 

1. the modulus of the maximum truncation error, fih 3 n \/"(£)|, can be 
readily evaluated and 

2. the expression ri/i 3 n|/"(£)l cannot be readily evaluated. 

1- Tih 3 n can be readily evaluated. 

If the accuracy required in the value of the integral is known, then we 
can calculate a suitable value of h and the number of decimal places which 
should be retained in the function values in order to achieve this. This 
means that the work of actually evaluating the integral is then minimized. 
Using an unnecessarily small value of h would involve evaluating the 
function at unnecessarily many points, and retaining too many digits in 
the function values themselves would involve extra (and unnecessary) work 
in the calculations. This latter point is unimportant if the calculations are 
being carried out on a computer. 

Example. Determine a suitable value of h and the number of decimal 
figures which should be retained in the values of (l + x) 1/2 in order to 
evaluate j*(l +x) Ll2 dx correct to three decimal places. 
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If the values of (l + x) 1/2 'are rounded to k decimal places, then the 
resulting absolute round-off error in the integral will be less than or equal 
to (4-2)±l(T* = 10“* 

We must try to choose k so that this error will not affect the value of 
the integral when rounded to three decimal places. This should normally 
be the case if 10 _fc does not exceed 0-00005. Hence use k — 5, that is, 
round-off the values of (l4-x) 1/2 to five decimal places. The modulus of 
the error will now be less than or equal to 0*00001. 

The modulus of the maximum truncation error is less than or equal to 


*h 2 (fc-a)|/"(£)l = hh 2 2 



(as in the previous example). Once again we require this error not to exceed 
say 0-00005, that is 


s; 0 00005 


therefore 


72^3 

h 2 < (0-00005)72^/3 ~ 0-0062 


therefore h < 0-078. 

Hence, a convenient value of h would be 0-05. 

(Note that 0-07,0-06 would not be suitable since the length of the interval 
of integration (namely 2) is not exactly divisible by either of these numbers.) 

With h = 0-05, the modulus of the truncation error is less than or 
equal to 


(0-05) 2 

72V3 


- 0 00002 . 


If the integral is now evaluated with h = 0-05 and retaining five decimal 
places in the values of (1 +x) 1/2 , the error in the result should lie between 
— 0-00001 and 0-00003, which should enable the final value to be obtained 
correct to three decimal places. 

Now if our function values are specified to a given accuracy, this will 
limit the accuracy we can obtain in the final result; in theory we could 
then calculate a critical value of h such that using values smaller than this 
critical value will not increase the accuracy of the result. The error due to 
round-off would be dominant. 

Alternatively, if h is given, this will limit the accuracy of the final result, 
and in theory we could calculate the maximum number of decimal places 
it would be useful to keep in the function values. Using any more decimal 
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places would not increase the accuracy of the final result. The truncation 
error would be dominant. 

2. j 2 h 3 n\f"( 0 \ cannot be readily evaluated. 

This is the case most likely to occur in practice. Indeed, even when 
T 2 h 3 n\f"(£)\ can be evaluated, it will still often be preferable to proceed in 
the following way rather than as described above. 

6.1.4. Interval halving method 

From the form of the truncation error we see that it can be reduced by 
using a smaller value of h. The method to be adopted now is as follows. 

Starting with a relatively large value of h evaluate an approximation to 
the integral. Now replace h by \h and evaluate another approximation 
to the integral. Replace \h by \h and evaluate another approximation to 
the integral, and so on. If the result is required to k significant figures, 
then this process of successively halving the interval h in x should be 
continued until the (/c+ l)th significant figure is constant, or at least until 
the variations in this figure are so small that the correct rounding-off to 
k figures can be carried out. If the (/c+ l)th figure is five, however, it may 
be necessary to carry out a further step in the process and to look at the 
(k + 2)th figure to determine whether the five in the (k+ l)th place is itself 
a result of rounding up so that the correct rounding to k figures can be 
carried out. The need for considering at least the (fc + l)th figure can be 
illustrated as follows: 

Suppose § b a f(x)dx is required to two decimal places; using interval size 
/z, the value 3*719 is obtained for the integral, and using interval size \h 
the value 3*724 is obtained. Correct to two decimal places these two values 
are the same, namely 3*72, and it might be thought that the result has now 
been obtained correct to two decimal places. However, using an interval 
size \h the result could quite easily be 3*727, which correct to two decimal 
places is 3*73. 

Readers familiar with elementary computer programming will readily 
realize that this method of successively halving the interval h is especially 
suitable for carrying out on a computer and will require quite a short 
program. 

While this method does not involve estimating the truncation error, it 
may still be necessary to estimate, as above, the error due to round-off 
errors in the given data to ensure that we are not demanding a greater 
accuracy in the final result for the integral than the given data (that is, the 
function values) are capable of supplying. 
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Example. Use the trapezoidal rule and values of (1 + x) 1/2 correct to two 
decimal places to obtain an approximation to the integral J*(l 4- x) 1/2 dx. 

The modulus of the error in the integral due to round-off errors in the 
values of (1 + x) 1/2 is less than or equal to (4 — 2)il0 -2 = 0*01. Hence in 
the final result any digits after the second decimal place may be 
meaningless. 

With h = 2 (see Figure 7a), we obtain the following values: 

x (l + x) 1/2 


2 1*73 

4 2*24 



Then j 4 (1 + x) 1/2 dx ~ i(2)(l-73 + 2 24) 

= 3-97. 

Then, with h= 1 (see Figure 7b), we must evaluate 4 1/2 . 


x (l+x) 1/2 

3 2-00 
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Then J (l + x) i/2 dx ~ i(l)(l-73 + 2(2-00) + 2-24) 

= i(3-97 +2(2-00)) 

= i(7-97) 

= 3-985. 

With h = j (see Figure 7c) we must also evaluate (3-5) 1/2 and (4-5) 1/2 . 

x (l+x) 1/2 


2- 5 1-87 

3- 5 2-12 



Then (1 +x) l/2 dx a *(i)(l-73+2(l-87)+2(2-00)+2(2-12) +2-24) 

= i[7-97 + 2(l-87 + 2-12)] 

= i(15-95) 


= 3-988. 

This last number has been rounded to three decimal places since this is 
one digit more than is required in the final result. (We could have no 
confidence in any figures after the second decimal place in the final result 
because of the round-off errors in the values of (1 +x) 1/2 .) 

With h = i (see Figure 7d), we must obtain a few more values of the 
integrand 

x (l + x) 1/2 

2-25 1-80 

2- 75 1-94 

3- 25 2-06 

3*75 2-18 
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Figure 7d 2 3 4 

J 4 (\+x) l ' 2 dx i(i)(l -73 + 2(1-80) + 2(1 -87)+2(1-94)+ 

2(2-00) + 2(2-06) + 2(2-12) + 2(2-18) + 2-24) 
= £[15-95+ 2(1-80 +1-94 + 2-06+2-18)] 

= *(31-91) 

— 3-989 rounded to three decimal places. 

At this stage it seems safe to assume that further applications of the 
process with h = ... will all yield the value 3-99 rounded to two 

decimal places. 

Hence we now have 

| (l + x) ll2 dx = 3-99+0-01. 

(Note that, correct to three decimal places, the true value of the integral is 
3-989.) 

Example. Evaluate 

P dx 

J1.5X-I 

correct to one decimal place using the trapezoidal rule. 

If the values of l/(x — 1) are rounded to k decimal places, the modulus 
of the resultant error in the value of the integral is less than or equal to 
(3 — 1-5)^10 = f 10“*. We now attempt to choose k so that this error will 
not affect the value of the integral when rounded to one decimal place. 
This should be the case if |10 - * does not exceed 0 005 = ^10 -2 . The 
smallest integer value of k to satisfy this condition is 3. Thus we should 
evaluate the necessary values of l/(x — 1) correct to three decimal places. 
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We shall obtain a first approximation to the value of the integral by 
taking h = §. 



x x— 1 

l/(x-l) 

- 

1 i i 

3 2 

2-000 

0-500 

Then 

dx 

4 — i(f) (2*000 4-0*500) 

15*~1 



II 

■Nw 

1y 

Oi 

8 



= 1*88 


rounded to two decimal places. 

Now take h = f. We must calculate an additional value of the integrand. 


X x— 1 

l/(x-l) 


2i li 

0*800 

Then 

(Jx 

, ~ i(|) (2-000+2(0-800)+0-500) 

'l-sx-l 


= 1(4-100) 



= 1-54 


rounded to two decimal places. 

Now take h — §. We must calculate a few more values of the integrand. 


X x— 1 

l/(x 1) 


2| if 

1143 

0-615 

Then 

^ dx 

~ i(|)(2-000 + 2(1-143 + 0-800 4- 0-615) 4- 0-500) 

1-5 X-l 


= ^(4*100 + 2(1-143 + 0615)) 
= *(7-616) 

= 143 


rounded to two decimal places. 
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Now take li = *. We must calculate a few more values of the integrand. 


Then 


X 

x — 1 

l/(x-l) 

1U 

ft 

1455 


life 

0-941 

2* 

life 

0-696 

2ft 

1ft 

0-552 


-i(*)(7-616+2(l-455+0-941+0-696+0-552)) 
= *(14-904) 


= 140 


rounded to two decimal places. 

Now take h = *. Once again we must calculate a few more values of 
the integrand. 


X x — 1 l/(x —1) 


Then 


if! 

if 

1-684 

m 

25 

32 

1-280 

iff 

31 

32 

1-032 


1* 

0-865 

2*i 

lit 

0-744 

2U 

iff 

0-653 

2ft 

iff 

0-582 

2ff 

iff 

0-525 


dx 

- r - ?{tz) (14-904 + 2(1 -684 +1-280 +1 -032+0-865+0-744 

1-5 X— 1 

+0-653+0-582+0-525)) 

= *(29-634) 


= 1 39 


rounded to two decimal places. 

At this stage it seems safe to assume that further applications of the 
process with h = *, if®,... will all yield the value 1-4 rounded to one 
decimal place. 

Hence, correct to one decimal place 


f 3 dx 

J 1-5 X-l 


= 14. 


(The value of the integral correct to three decimal places is 1-386.) 
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6.1.5. Flow diagram 

In this subsection we give a simple flow diagram for applying the 
trapezoidal rule. Readers familiar with computer programming should be 
able to use this as a basis for writing a program. 
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The reader should modify the above flow diagram so that it allows 
for a successive halving of the interval size h and subsequent re-evaluation 
of the integral until convergence is reached to some specified tolerance 
(as in the last example). 

EXERCISES 

1. From the following correctly rounded data calculate approximations to the value 
of the integral ]o 8 f(x)dx using the trapezoidal rule with h = 0-8, 0-4, 02, 0*1. 

x 0-0 0-1 0-2 03 04 0*5 06 07 0*8 

f(x) 0*0000 0*0499 0*0995 0*1483 01960 0*2423 02867 03290 0*3688 

2. Use the trapezoidal rule to obtain approximations to 

f 1 dx 

J o 1 + x 

taking (i) h = 05, (ii) h = 0*25 and (iii) h = 0125 and working to five decimal 
places. Estimate round-off and truncation errors in each case. 

3. Determine h so that the trapezoidal rule will give the value of 



correct to three decimal places. How many decimal places should be retained in 
the values of 1/x? 

Evaluate the integral correct to four decimal places. 

6.2. Simpson’s Rule 

Once again the problem is to determine an approximation to \af(x)dx 
where the values of the function / are known for a sequence of equally 
spaced values of x between x = a and x — b (Figure 8). 



Figure 8 
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Instead of joining each consecutive pair of points (x r , f r \ ( x r+i , f r+1 ) 
by a straight line to obtain an approximation to the graph of y = f(x) 
as in the derivation of the trapezoidal rule, we shall join the three points 
( x O’/o)’ ( x vfi) an d ( x 2> fd by a parabola (that is, a curve of degree two), 
the three points (x 2 , fi\ (x 3 , / 3 ) and (x 4 , / 4 ) by another parabola, and so 
on up to the last three points (x„_ 2 , /»- 2 ), (x„-i, /„- 1 ) and (x„, /„) 
(Figure 8). Note that this implies that n must be even, that is, that the 
number of points is odd. Note also that, using straight lines for the 
approximation, we could join each pair of consecutive points by a straight 
line, since it only requires two points to define a straight line uniquely. 
However, if we use parabolas (or curves of degree two) then it is necessary 
to take the points in threes since it requires three points to define a 
parabola uniquely. 

As in §5.2 the parabola through the points A(x 0 , /o), B(xi, /j), 
C(x 2 , / 2 ) has the equation 


y = /o + 


x-x 0 A , , (x-x 0 )(x-xi) a2 


4/0+ ~ 


2 h 2 


A /o- 


Then, with this approximation to /(x) in the interval from x 0 to x 2 we 
see that 


f(x)dx ~ 


/o + 


, (x-x 0 )(x-xi) a2 


4/o+~ 


2 h 2 


A 2 fo>dx 


= iM/o4-4 / 1 + / 2 ).* 


Similarly 


I: 


/(x)dx~^(/ 2 + 4/ 3 + / 4 ). 


Hence 

b 


f(x)dx 


-f 


f(x)dx 


= /(x)dx + /(x)dx+***+ f(x)dx 

Jx 0 Jx 2 JX„-2 

— iM/o + 4/i +/ 2 ) + jh (/ 2 + 4 / 3 +/ 4 )-h' * •+i ^(/«-2 + 4/„_i + i /„) 
~ 3 M /0 + 4 / 1 + 2 / 2 + 4 / 3 + 2 / 4 + * • - + 2 / n _ 2 + 4/ n _ j +/„). 


This is Simpson's rule and is of course exact if the curve represented by 
y = f(x) is a parabola or a series of parabolas as in Figure 8. 


For details of the integration see Appendix 3. 
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Example. Use Simpson’s rule to obtain from the data of the example 
on page 55 an approximation to Qf(x)dx. 

Here h = 0-5 and so, from Simpson’s rule, we obtain 

j 4 /(x)dx ~^x0-5(l-7321+4(l-8708)+2(2-000)+4(2-1213) + 2-2361) 

= 4 x 05(23-9366) 

= 3*9894 rounded to five significant digits. 

This result is of limited value until we can estimate its accuracy. 


6.2.1. Effect of round-off on Simpson*s rule 

With the same notation as before, the error in the integral due to errors 
in the given data is equal to 

jh(eo + 4ei +2^2 4-' * * + 4e n ~ i +e n ). 

Hence the modulus of this round-off error is less than or equal to 

jh(\e o| +4|ei| +2|e 2 | +■ * * + 4|e„_i| + \ e n \). 

Now if each of the absolute errors \e r \, r = 0, 1, 2,..., n, is less than or 
equal to \e\ 9 then the modulus of the resulting error in the integral is less 
than or equal to 

iMM +4kl +2\e\ + • • - + 4\e\ + \e\) = i/z(2 + 4(in) + 2(^- l))\e\ 

= nh\e\ 

= (b-a)\e\. 

Hence, if all the function values have been rounded-off to k decimal 
places, \e\ and the modulus of the corresponding error in the 

integral is less than or equal to ^{b — a)10~ k . Note that this is the same 
expression as was obtained for the trapezoidal rule. 

Example. Estimate the round-off error in the result of the last example 
due to possible round-off errors in the given function values. 

The modulus of the error in the integral, due to the round-off errors in 
the given function values is less than or equal to ^(4 —2)10“ 4 = 0*0001. 

6.2.2. Truncation error in Simpson's rule 

If the function / is sufficiently differentiable it can be shown, using Taylor 
series, that the modulus of the truncation error for the integral of / from 
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a to b is less than or equal to 


jhh 4 (b-a)\f^)\ 

where l/ iv (£)l = max {|/ iv (x)|; x ^b}. 

Further, if it is known that / iv is always positive in the interval from a 
to b , then the truncation error will be negative, while if / iv is always 
negative then the truncation error will be positive. 

Once again, the smaller the value of h , the smaller will be the modulus 
of the maximum truncation error. 


Example. Estimate the truncation error in the value of j*(l +x) i/2 dx 
obtained in the example on page 69. 

When f(x) = (l + x) 1/2 , f'{x) = i(l +x)~ 1/2 , f"(x) = -|(1 +x) _3/2 , 
f"'{x) = !(l + x)- 5/2 and r(x) = -M(H-x)- 7/2 . 

Clearly then / iv has its maximum numerical value in the interval 
2 ^ x ^ 4 when x = 2; that is, £ = 2 and so 


l/ iv (£)l = 


15 1 15 

16 J 3 7 43273' 


Hence the modulus of the truncation error 


^jkoh^ib-a^m 


= lio(0-5) 4 (4-2) 


15 

43273 


~ 0 000016. 


But, from the form of / iv it is obvious that / iv is always negative for 
2 < x ^ 4 so that the truncation error will be positive. 

Thus, the truncation error will lie between 0 and 0-00002. Comparing 
this truncation error with the round-off error for the same problem 
(obtained in the previous example) we see that in this case the round-off 
error is dominant (and so the truncation error may be neglected). 

Hence 3-9893 < J* (l+x) 1/2 dx < 3-9895. 

Therefore J (1 + x) 1/2 dx = 3-989 to three decimal places. 

(The true value of the integral to five decimal places is 3-98946.) 

It must be emphasized that it will very rarely be possible to carry out 
an analysis similar to that of this last example because of the difficulties 
in evaluating f n {f). 
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6.2.3. Application of Simpson's rule 

As for the trapezoidal rule, if the truncation error can be readily estimated, 
then we can determine a suitable value of h and the number of decimal 
places which should be retained in the function values in order to attain 
a given accuracy in the value of the integral. However, quite often it will 
be impractical to evaluate the truncation error. In such cases, we calculate 
successive approximations to the integral using the intervals h, \h, £h ,.. . 
until the required accuracy is attained (as for the trapezoidal rule). This 
method will be illustrated in examples. 

Example. From values of (1 +x) 1/2 correct to two decimal places use 
Simpson’s rule to obtain an approximation to the integral \\{l +x) 1/2 dx. 
(The values of (1 + .x) 1,2 used in this example are given in the example on 
page 61.) 

The modulus of the error in the integral due to round-off errors in the 
values of (1 +x) 1/2 is less than or equal to (4 — 2)^10 2 = 0-01. Hence in 
the final result any digits after the second decimal place may be 
meaningless. 

Now, taking h = 1, we obtain 

j 4 (l +x) 1/2 dx ~ i(l-73+4(2-00)+2-24) 

= £(11-97) = 3-99. 

Then, with h = £, we obtain 

'4 

(1 +x) 1/2 dx ~ £(£)(l-73+4(1-87)+2(2-00)+4(2-12) + 2-24) 

J2 

= £(3-97 + 2 x 2-00+4(1-87 + 2-12)) 

= £(23-93) = 3-988 rounded to three decimal places. 

With h = £, we obtain 

J 4 (l + x) 1/2 dx ^ £(£)(l-73+4(1-80)+ 2(1-87)+4(1-94)+ 2(2-00)+4(2-06) 

+ 2(2-12)+4(2-18)+ 2-24) 

= ^{3-97 + 2(2-00+1-87 + 2-12) 

+4(1-80 + 1-94 + 2-06 + 2-18)} 
= tz( 47*87) = 3*989 rounded to three decimal places. 
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At this stage it seems safe to assume that further applications of the process 
with h = Y6, ... will all yield the value 3*99 rounded to two decimal 

places. 

Hence we now have 

| (l+x) 1/2 dx = 3-99±0-01. 

Example. Evaluate 

* 3 dx 

% 1*5 X 1 


correct to one decimal place using Simpson’s rule. 

As in the example on page 63 we see that we should evaluate the 
necessary values of l/(x —1) correct to three decimal places. (The 
evaluations obtained in that example will be used here.) 

With h— we obtain 

I 3 — - dx ^ £(f) (2-00+4(0-800)+0-500) 

Ji-sx-l 

= £(2-500+4(0-800)) 

= *(5*700) = 142 to two decimal places. 


With h = f, we obtain 


fa 


15 X— 1 


dx ~ j(f) (2*500 + 2(0*800) + 4(1*143 + 0*615)) 


= ^(11*132) = 1*39 to two decimal places. 


With h = Ye, we obtain 
r 3 i 


1.5X-1 


Hence 


dx - i(W (2-500 + 2(0*800 +1 ■ 143 + 0*615) 

+ 4(1 455 + 0*941 + 0*696 + 0*552)) 
= x^(22T92) = 1*39 to two decimal places. 

3 1 


; X 1 


dx = T4 correct to one decimal place. 


6.2.4. Flow diagram 

The flow diagram for applying Simpson’s rule is similar to that for 
applying the trapezoidal rule given in §6.1.5 and the reader should not 
have too much difficulty in carrying out the appropriate modifications. 
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Having obtained a flow diagram for Simpson’s rule similar to that given 
in §6.1.5 for the trapezoidal rule, the reader should then modify it to allow 
for a successive halving of the interval size h and subsequent re-evaluation 
of the integral until convergence is reached to some specified tolerance. 


EXERCISES 

1. From the following correctly rounded data calculate approximations to the value 
of the integral Jo f(x)dx using Simpson’s rule with h = 2, 1, ? and compare the 
results with the rounded true value of 1*3971. 

x 0*0 0-5 1*0 1*5 2*0 2*5 3*0 3*5 4*0 

f(x) 0*0000 0*2423 0*4401 0*5579 0*5767 0*4971 0*3391 0*1374 -0*0660 

2. Use Simpson’s rule to obtain approximations to 

M dx 
Jo 1+X 

taking (i) h = 0*5, (ii) h = 0*25 and (iii) h = 0*125 and working to five decimal 
places. 

Estimate round-off and truncation errors in each case. Compare the results with 
those obtained in Exercise 2 on page 67. 

3. Determine h so that Simpson’s rule will give the value of 

r 3 9 1 / 

I -dx correct to three decimal places. 

How many decimal places should be retained in the values of 1/x? 

Evaluate the above integral correct to three decimal places. 

6.3. Comparison of the trapezoidal rule and Simpson’s rule 

We have seen above that the effect of round-off errors on the two methods 
is similar. Using the trapezoidal rule the truncation error is proportional 
to h 2 , while using Simpson’s rule the truncation error is proportional to 
/z 4 . But for values of h which are less than 1, /z 4 is smaller than h 2 and so 
in general, unless/" is very small or/* v is very large, we expect Simpson’s 
rule to be more accurate than the trapezoidal rule for a given value of 
h < 1. 

Example. With h = \ estimate 

P—Urfx 

Jo 1+X 2 

using (i) the trapezoidal rule and (ii) Simpson’s rule. 
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Compare the results of (i) and (ii) with the value 0-785 which is the true 
value of the integral rounded to three decimal places. 

If = ]7j~2 then 

/( 0 ) = 1 
m =o-8 

/(l) = 0-5. 

Note that all these function values are exact (and not subject to round¬ 
off errors). 

(i) f 1 -l^dx^i0(l+2(0-8)+0-5) 

Jo 1 

= 1(3-1) = 0*775. 

Note that this result is free from any round-off error. 

(ii) f 1 j-t-^dx ~ M)(l + 4(0-8)+0-5) 

Jo 1 + x 

= g(4-7) = 0-783 rounded to three decimal places. 

Comparing both of these approximations with the true value (to three 
decimal places) of 0-785 it is clear that Simpson’s rule has given the better 
approximation. 


EXERCISES 

1. From the given correctly rounded data evaluate the best approximation you can 
to using (i) the trapezoidal rule and (ii) Simpson’s rule. 

x 0 0-2 0-4 0-6 0-8 1-0 1-2 1-4 1-6 

e* 1 1-2214 1-4918 1-8221 2-2255 2-7183 3-3201 4-0552 4-9530 


How many figures would you trust in your answers? 

2. Evaluate Jg' 8 tan x dx correct to three decimal places using (i) the trapezoidal rule 
and (ii) Simpson’s rule. 

3. Evaluate 


f 2 x+1 

Jo X 2 + 1 


dx 


correct to two decimal places using (i) the trapezoidal rule and (ii) Simpson’s rule. 
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6.4. The method of undetermined coefficients 

In this section we consider an important and widely applicable alternative 
approach to the derivation of integration formulae. 


6.4.1. Newton-Cotes-type formulae 

We shall look here for integration formulae which express the approxi¬ 
mation to the definite integral J*/ (x)dx in the general form 

*m + mh) + yf{2h) + --- + E 

where a., P,y, ... are undetermined coefficients and E is an error term. 
This is a perfectly reasonable starting point since all we are setting out 
to do is obtain an approximation to the integral as a linear combination 
of the values of the integrand. (Note that we have already seen that both 
the trapezoidal rule and Simpson’s rule are of this form.) Formulae of the 
above form are called Newton-Cotes formulae. 

In particular, considering the formula 

J f{x)dx = a/(0) + £ 

we see that the error term can be made zero for the function f(x) = 1 (and 
so for all constant functions) if a is chosen appropriately as follows: 

J f(x)dx = a/ (0) + E. 

Now put f(x) = 1. 

Therefore J ldx = a + E. 


But 


'h 

1 dx = h and so E will be zero if a = h. 
o 


Thus we have the approximate integration formula 

J f(x)dx ~ hf(0). 

which is in fact exact if f{x) — constant (Figure 9). When/(x) ^ constant 
the error term £(# 0) can be found using Taylor series. 
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6A. 1.1. The trapezoidal rule. Consider the formula 


i: 


f(x)dx = a/(0)+ /?/(/!) + £. 


We now have two undetermined coefficients, a and ft. Thus the error term 
E in this formula can be made zero for f(x) = 1 and for f(x) = x by 
choosing a and (1 appropriately as follows: 


i: 


1 dx = h. 


Therefore E in the above formula will be zero when f(x) = 1 if 

oc + P = h. 


Also 


xdx = [?x 2 Jb = hh 2 . 


Therefore E in the above formula will be zero when f(x) = x if 

a x 0 + ph — \h 2 


since /(0) = 0 and f(h) = h. 

Therefore P = ?h and a = \h. 

Hence we have the approximate integration formula 

- W(0 )+W(h) = ih(/(0)+/(/!)). 
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If f(x) is any linear function, say f(x) = lx Fm where l and m are constants, 
then 


f(x)dx=\ (lx + m)dx 

o Jo 

= [±lx 2 + mxf 0 
= \lh 2 + mh 
= %h(lh + 2m) 

= \h(m + (lh + m )) = jh(f(Q) + f{h)). 


Hence the formula is exact for all linear functions. 

When f(x) is not linear, the truncation error term E(^ 0) can be found 
using Taylor series. 

Now put u = x — xq in the integral 


Jx, 


f{x)dx 


in which X\ = x 0 + h. 
Therefore 


where F(u) — f(u + x 0 ). 




f(x)dx = f(u + x 0 )du 


1 


= F(u)du 


But 


Hence 


i: 

ji 


F{u)du ~ i*i[F(0) + F(fi)] 

= i^[/( x o) + /(^ + x o)] 
= ?h[f(x 0 ) + f{xi)]. 

f(x)dx i/i[/(x 0 )+/(xi)]. 


This is the trapezoidal rule. 
Then 

’’Xn 

/(x) dx 

)X 0 

f(x)dx + J* f(x)dx- t- 


A 1IW11 

j>-j. 

-c 


• ••+ f(x)dx 
J x n — 1 

- ih[/(x 0 )+/(xi)]+ih[/(xi)+/(x 2 )] + - • - + ih[/(x n -i)+/(x„)] 


= ih(/ 0 + 2/ 1 + 2/ 2 + - • •+2/„_ 1 +/„). 
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Example. Show that the trapezoidal rule is not exact for a quadratic 
function. 

Take f(x) = kx 2 + lx + m. 

Then we know that the trapezoidal rule is exact for the expression (lx + m) 
and so we need only consider the integral of g(x) — kx 2 . 

*h rh 

g(x)dx = kx 2 dx = k[}x 3 Jb 
Jo Jo 

= ifcfc 3 . 

But \hlg(0)+g{h )] = $h(kh 2 ) = jkh 3 . 

Therefore J g(x)dx ^\_g(0)+g(h]i] 

and so the trapezoidal rule is not exact for a quadratic function. (Note 
that in this case the truncation error E is jkh 3 -%kh 3 = — £ kh 3 = —fak 3 g", 
compare §6.1.2, page 56.) 

We can of course derive other formulae of the form 

m h 

f(x)dx = a/(0) + Pf(h) + yf(2h) + ••• + £ 

Jo 

in the same way as above. 

For example, choosing a, ft and y appropriately in 

j* f{x)dx = af (0) + J 8f(h) + yf(2h) + E 

we can obtain a formula for which the error term E is zero for all poly¬ 
nomial functions f(x) with degree < 2 (that is, for constants, linear 
functions and quadratic functions). Such formulae are not in common use, 
however, except in special circumstances, and so will not be considered 
further. 


6.4.1.2. Simpson's rule. Now consider the formula 

% 2h 

f(x)dx = am + mh) + yf(2h) + E. 

Jo 

We now have three undetermined coefficients a, ft and y and we can use 
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these to ensure that the error term E is zero for f{x) = l,/(x) = x and 

fix) = x 2 . 

% 2 h 

ldx = 2 h. 


j: 


Hence E will be zero in the above formula when f(x) = 1 if 

x+P + y = 2h. 


( 1 ) 


Jo 


xdx = [WTo h = 2 h 2 . 


Then E will be zero in the above formula when /(x) = x if 
a x 0 + f}h + y(2h) = 2 h 2 
that is, if p+2y = 2h. 


( 2 ) 


Also 


*2 h 

Jo 


xHx = [ix 3 ]g h = fh 3 . 


Therefore E will be zero in the above formula when /(x) = x 2 if 
a x 0 + ph 2 + y(2h) 2 = f/t 3 
that is, if i?+4y = ffc. 


(3) 


Now, solving the three simultaneous equations (1), (2) and (3) for a, P 
and y, we obtain oi = jh,P = $h and y = \h. 

Hence we have the approximate integration formula 


r 2/t 

Jo 


f(x)dx ~ W(0)+W(h)+W(2h) 
= ih(/(0)+4/(h)+/(2h)). 


If /(x) is any quadratic function, say /(x) = kx 2 + lx + m where fc, / and m 
are constants, then 


/* 2h 2h 

f(x)dx = ( 

Jo Jo 


(fcx 2 + lx + m)dx 


= [jfcx 3 + 2 ^X 2 + ffix]^ 

= ffcfc 3 + 2lh 2 + 2mh 
= ^(/(0) + 4/(h)+/(2h)). 
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Hence the formula is exact for all quadratic functions (and so for all linear 
functions or constant functions). 

Now consider f(x) = x 3 . 

C2h 

I x 3 dx = [ix 4 ]o h = 4 h 4 

= &(4h 3 +(2h) 3 ) 

= ih(f(0)+4f(h)+f(2h)). 

Hence the formula is also exact for f{x) = x 3 and so (as can be shown) 
for all cubic functions. Thus we have an unexpected “bonus”. 

Now consider /(x) = x 4 . 

• 2h 

xUx = [ix 5 ]^ - ¥h 5 . 

Jo 

But ifc(/( 0) + 4f(h) + f(2h)) = ift(4/i 4 + 16/z 4 ) 


Therefore 


% 2h 

x 4 dx # ^h(f (0)+4/ (h) +f (2h)) 

0 


and so the formula is not exact for a quartic function. (Note that in this 
case the truncation error E is — i 5 = —rsh 5 = h 5 f ™; compare 
§6.2.2, page 69.) 

Hence the formula is exact for all polynomials of degree less than or 
equal to three. For other functions the form of the truncation error E can 
be found using Taylor series. 

Now put u — x — xq in the integral 

f f(x)dx 
J xo 

in which x 2 = Xi + h = x 0 + 2h. 

Therefore f f(x)dx = f f(u + x 0 )du 

J*o Jo 

l*2h 

= F(u)du 


where F(u) = f(u+x 0 ). 
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But 


2 h 


F(u)du ^ [F(0) + 4F(h) + F(2/i)] 

o 

= I*[/(xo) + 4/(*o + fc)+/(xo+2fc)] 
= ^hlf(x 0 )+4f(x 1 )+f(x 2 )l 


Hence 


i: 


f(x)dx a- i/i[/(x 0 ) + 4/(xi)+/(x 2 )]- 


This is Simpson’s rule. 


i 


Then f(x)dx = 


f(x)dx 


(**2 f *4 r Xn 

= f(x)dx+ I f(x)dx + -"+ f(x)dx 

JX 0 J x 2 J x n~2 

* i/»[/(x 0 )+4/(x 1 )+/(x 2 )] +i/»[/(x 2 )+4/(x 3 )+/(x 4 )] 
+ •”+& [/(Xn- 2 ) + 4/(x„-i) + /(x„)] 


= iM/o + 4/i + 2/ 2 +4/3 + • • • + 2/„- 2 + 4/„_! +/„). 


We can of course derive other formulae of the form 



f(x)dx = a/(0) + PAh) + 7/(2fc) +' * ‘ + F 


in the same way as above. Such formulae are not in common use, however, 
except in special circumstances, and so will not be considered further. 


6.4.2. Euler-Maclaurin formulae 

The method of undetermined coefficients can also be applied to obtain 
approximate integration formulae in terms of the values of the integrand 
and its derivative at the equally spaced tabulation points. Such formulae 
(involving integrand derivative values as well as integrand values) are 
called Euler-Maclaurin formulae. 

To illustrate the procedure we shall apply the method of undetermined 
coefficients to derive a formula of the type 

jV(x)4x = a/(0) + Wh) + xi /'(0) + pj'(h) + E 
in which a, a 2 , /?, /?i are constants and E is an error term. 
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We have four undetermined coefficients a, a i? /?, p x so that the error 
term E in the above formula can be made zero for f(x) = 1, /(x) = x, 
/(x) = x 2 and /(x) = x 3 by choosing these coefficients appropriately. 

% 

ldx = h. 

Jo 

Therefore E in the above integration formula will be zero when /(x) = 1 
and /'(x) = 0 if 

a + (1 = h. 


Jo 


xdx = t/i 2 . 


Therefore £ in the above integration formula will be zero for /(x) = x 
and f\x) — 1 if 

hft + tti+fii = ^h 2 . 


i: 


x 2 rfx = 


Therefore E in the above integration formula will be zero for /(x) = x 2 
and /'(x) = 2x if 

h 2 p + 2hPi 

m h 

x 3 dx = */t 4 . 


*h 

Jo 


Therefore E in the above integration formula will be zero for /(x) = x 3 
and /'(x) = 3x 2 if 

h i p + 3h 2 p 1 = i/i 4 . 

Then we obtain, on solving the simultaneous equations for a, a 1? /?, fi x , 

<x = ih, P = jh, ai = tj/i 2 , Pi = -rih 2 . 

Hence we have the approximate integration formula 


•/ 

Jo 


/(X)dx * i/i(/(0) + /(/i)) + ^/z 2 (/'(0)-/X/z)). 


If /(x) is a polynomial of degree less than or equal to 4, then this formula 
is exact. 

It is not difficult to deduce from this result that 


/% 

X 

JXi 


f{x)dx ca jh(fo + 2fi + 2/2 + * * • + 2f n _ 1 + /„) + ~fah 2 (fo — /„') 
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in the notation defined earlier. Thus we have obtained the trapezoidal rule 

with an extra “correction term” rih 2 (/o — /*')• 

If the function values have been rounded to k decimal places and the 
derivative values have been rounded to k\ decimal places, then the 
modulus of the round-off error in this method is (compare the trapezoidal 
rule, page 56) less than or equal to 

— Xo)10 k + Tjh 2 10 kl . 

The truncation error can be investigated for twice-differentiable 
functions using Taylor series. 


Example. Use the formula derived above and values of (1 + x) 1/2 correct 
to two decimal places to obtain an approximation to the integral 
J^(l + x) {fl dx. 

1 

We have f(x) = (1 + x) 1/2 and so /'(*) = 2 (1 + X ) 1/ I * 


Now the modulus of the round-off error in (1 + x) 1/2 is less than or equal to 
^10” 2 and so the modulus of the error in ^ ess ^ an or ec l ua l 

t0 ^7T1 \ (see § 2 - 63 ’ P a 8 e 16 >- 
2(1+ x) 

Hence the modulus of the error in each of /'(2) and /'(4) is less than or 
equal to i(il0 -2 ) < 0001. 

Therefore we shall use values of /' to three decimal places. Then with 
h = 2 (which is the largest possible value we can use) the modulus of the 
round-off error in the formula will be less than or equal to 


^(4 —2)10” 2 + —(0-001) =s 0-0103. 


Note that this error will become even closer to 0-01 as h is decreased. The 
required values of (1 + x) 1/2 are given in the example on page 61. 

1 

x (1+x) 1 ' 2 2(l+x) 1/2 

2 1-73 0-289 

4 2-24 0-223 
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(1 +x) l ' 2 dx ~ i(2)(l-73 + 2-24)+^(2) 2 (0-289-0-223) 
= 3-992. 

With h = 1 we obtain 


r 


(l+x) 1/2 dx a ^(l)(l-73 +2(2-00) + 2-24) + ^(l) 2 (0-289—0-223) 


= 3-991 


rounded to three decimal places. 
Hence we now have 


£ 


(l + x) 1/2 dx = 3-99 ±0-01. 


EXERCISES 


1. Use the method of undetermined coefficients to evaluate (without first changing 
the origin of x) the coefficients a and in the integration formula 

**i 

f(x)dx = CLf(x 0 ) + ff(xi) + E. 

Jx o 

2. Use the method of undetermined coefficients to evaluate (without first changing 
the origin of x) the coefficients a, /? and y in the integration formula 


* * 


f(x)dx = af(x 0 )+Pf(x 1 )+yf(x 2 ) + E. 


3. Use the method of undetermined coefficients to evaluate the coefficients a, b , c in 
the integration formula 



~ 1/2 f(x)dx = (2hyi 2 {am + bf(h) + cf(2h)}+E. 


4. Use the method of undetermined coefficients to evaluate the parameters a and b 
in the integration formula 

J J(x)dx = hf(a) + hf(b) + E. 

Show that the error term E will be zero for polynomials of degree three or less. 
(This formula is an example of a Gaussian integration formula.) 


5, For the integration formula 


r 


f(x)dx ~ U(9f(0) + 19f(h) - 5f(2h)+f(ih)) 
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the modulus of the truncation error can be shown to be less than or equal to 
T Toh 5 f iv (£) where 0 ^ ^ 3 h. 

Use this formula to find, correct to three decimal places, $ 2 f{x)dx where 
f(x) is given by the following table and it can be assumed that |/ iv (x)| is very 
much less than 1. 

x 0 0*2 04 0*6 

f(x) 1*0000 0*8187 0*6703 0*5488 




Iteration 



7.1. Introduction 

If we require to obtain the roots of the equation 2x 2 — 4x — 3 = 0, we can 
make use of a well-known formula and obtain the roots in the form 
i(4± x /40). However, if we require to obtain the roots of the equation 
2x 4 + x 3 — 3x 2 + 4x — 1 = 0 or of the equation x = 2sinx, there is no 
known formula which we can apply to obtain these roots directly. In this 
chapter we shall consider how to obtain numerical approximations (to 
any required accuracy) to the roots of such equations. The method 
employed is called iteration. 

In an iterative process we start with an approximation x 0 to a root X 
and from this obtain another approximation x 1 from which yet another 
approximation x 2 is obtained and so on. For an effective process (for a 
particular root) the successive values (or iterates) xi, x 2 , X 3 , ... should 
become progressively closer to the root X. The process is continued until 
an approximation of the required accuracy is obtained. It is thus clear 
that for an iterative process we require both 

(i) a starting approximation x 0 and 

(ii) a method or formula for obtaining the approximation x„+i in terms 
of the approximation x„. 

7.2. Simple iteration 

If the equation whose roots we are attempting to find is fix) = 0, then 
frequently a starting approximation x 0 can be obtained from a sketch of 
the graph of /(x). 

If the given equation f(x) = 0 is rewritten in the form x = F(x), we 
can obtain the successive iterates Xi, x 2 , X 3 ,... as follows: 

Xi = Fix 0 ), 
x 2 = F(x 1 ), 
x 3 = F(x 2 ), 


x„+i = F(x„), 
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For example, consider the equation 

2x 3 — lx+ 2 = 0. 

A rough sketch shows that there is a positive root between 0 and 1 and 
another between 1 and 2 (Figure 10). 



Now the given equation can be rewritten 

x = f(x 3 + l) 

that is, in the form x = F(x) where F(x) = y(x 3 +1). We shall attempt to 
determine the root between 0 and 1 using the process defined by the 
starting approximation x 0 = 1 and the formula 

X n + 1 =$(x2 + l). 


We thus obtain 

Xi = f (l 3 +1) = 0-571 (rounded to three decimal places) 
x 2 = 3 ((0-571) 3 +1) = 0-339 (rounded to three decimal places) 

and so on. However the work is better laid out in tabular form as follows : 


n x n x2 7(*n +1) 


0 

1 

1 

0-571 

1 

0-571 

0-186 

0-339 

2 

0-339 

0-039 

0-297 

3 

0-297 

0-026 

0-293 

4 

0-293 

0-025 

0-293 

5 

0-293 




D 
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The process has converged to 0*293 (rounding to three decimal places). 
The exact root is in fact 1 — jy/2 = 0*292893 ... Greater accuracy in the 
iterated solution can be obtained by retaining more figures in the 
calculations. In the following four decimal places have been retained: 

n X n X 3 7 (X„ + 1) 

5 0-2929 0-0251 0-2929 

6 0-2929 

(Note that the value for x 5 used here was obtained by retaining four 
decimal places in the calculation of 7((0*293) 3 +1)). 

EXERCISES 

1. Use the iteration formula 

Y — i_ v 3 
Xn +1 2 X B 

and the starting value 0 to obtain an approximation to a root of the equation 

x 3 +x —7 = 0 . 

Continue the iterations until convergence is obtained using three decimal places. 

2. Use the iteration formula 

x„+i = 1 — sin x n 

and the starting value 1 to obtain an approximation to a root of the equation 

sinx + x —1 = 0 . 

Use three decimal places in the calculations. 

As we saw on page 87 the equation 2x 3 —7x + 2 = 0 has a second 
positive root between 1 and 2. We shall now attempt to obtain this root 
using the same iterative formula as before, namely x„+i = 4(x 3 +1) and 
starting value x 0 = 2. We now obtain (rounding the calculations to three 
decimal places) 

n x„ x 3 7 (x 3 -f 1 ) 

0 2 8 2-571 

1 2-571 16-994 5-141 

2 5-141 135-876 39-107 

This process is diverging and will never give the root between 1 and 2 no 
matter how many iterations are carried out. This illustrates the point that 
not all iterative processes converge. 
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The above iteration process is not, however, the only one we can devise 
for the given equation. The given equation can also be written in the form 


x = 


7x —2 
2x 2 


for x ^ 0, 


and so we now consider the process defined by the formula 

_ 7x„—2 
X ” +1_ 2x 2 

and a starting value x 0 . With x 0 = 2we obtain (rounding the calculations 
to three decimal places) 

lx„-2 


n 

x„ 

x 2 

7x„ —2 

2x 2 

2xl 

0 

2 

4 

12 

8 

1*5 

1 

1*5 

2*25 

8*5 

4*50 

1*889 

2 

1*889 

3*568 

11*223 

7*136 

1*573 

3 

1*573 

2*474 

9*011 

4*948 

1*821 

22 

1*709 

2*922 

9*963 

5*844 

1*705 

23 

1*705 

2*908 

9*935 

5*816 

1*708 

24 

1*708 

2*918 

9*956 

5*836 

1*706 

25 

1*706 

2*911 

9*942 

5*822 

1*708 

26 

1*708 






The process has at last converged although the third decimal place has not 
been completely determined. It could however be obtained by continuing 
the iterations retaining more significant figures in the calculations. The 
exact root is in fact 1 +j^J2 = 1*707106 — 

Before leaving this example it is worth mentioning that the given 
equation can also be rewritten in the form 


x = 


7x — 2\ 1/2 
2x / 


or in the form 


7x-2y /3 


so that the iterative formulae defined by 

f lx„ — 2 


Xn+ i — 


and 


x n +1 — 


2x„ 

7x„ —2 


might also be considered. 


1/2 

1/3 
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A comparison of the use of the different iterative formulae with different 
starting values is given in the following table: 

Iteration Starting 

formula value Result 


x»+i =f(x^ + l) 
lx n —2 


X„+1 = ■ 

2xl 

X«+1 = I 

(7x„ — i 


V 2x ” , 

■y — I 

(Tx„ — 2 

Xn + 1 — I 

l 2 


1/2 


1/3 


1 

2 

1 

2 

1 

2 

1 

2 


Converges to the root 0-292893 ... 
Diverges 

Converges to the root 1-707106 ... 
Converges to the root 1-707106 

Converges to the root 1-707106 ... 
Converges to the root 1-707106 ... 

Converges to the root 1-707106 ... 
Converges to the root 1-707106 ... 


7.3. Convergence of iterative processes 

As we have seen in the last section, the success of an iterative process 
for a particular root depends on 

(i) the starting value used and 

(ii) the iterative formula used. 

It would be useful if we could determine, before commencing the iterations, 
whether or not a particular formula and starting value are likely to yield 
iterates which converge to the required root. It would be useful also to 
know which of several converging processes for a given root is likely to 
converge fastest. 

A root A of the equation f(x) = 0 will satisfy /(A) = 0 and hence also 
A = F(A) since this is simply /(A) = 0 rewritten in a different form. Thus a 
root of the given equation is the x-coordinate of the point of intersection 
of the straight line y = x and the curve y = F(x). Consider Figure 11. 
These diagrams illustrate different iterative processes geometrically. In 
each case x 0 is the starting approximation. To obtain the next approxi¬ 
mation x 1 first find the intersection of the vertical through the point 
(x o ,0) with the curve y = F(x); then find the point of intersection of the 
horizontal through this point with the straight line y = x. The x- 
coordinate of this latter point of intersection is then xi = F(x 0 ). Similarly 
x 2 is obtained starting from xi and so on. Clearly the processes of Figures 
a and b are convergent and lead to the root A. However the processes of 
Figures c and d are divergent and cannot be used to determine the root A. 
What then is the essential difference between the processes of a and b and 
the processes of c and dl From Figure a we see that the slope of the curve 
y = F(x) is less than the slope of the line y = x. This implies that F'(x) < 1 
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Figure 11 


since the slope of the line y = x is equal to 1. From Figure c we see that 
the slope of the curve y = F(x) is greater than the slope of the line y = x. 
This implies that F'(x) > 1. In Figure b , since the slope of the curve 
y = F(x) is negative, we have that |F'(x)| < 1 and in Figured that \F'(x)\ > 1. 
In general it can be shown that a necessary condition for the convergence 
of the iterative process with formula x„+i = F(x n ) to a root k is that 
\F'(k)\ < 1. Further, it can also be shown that if F maps an interval /, 
containing a root A, into itself then | F(x) | < 1 in the interval is a sufficient 
condition for the convergence to k of the iterative process defined by the 
formula x n+1 = F(x n ) and any starting value x 0 in /. 

We shall now consider in the light of the above, the formulae devised 
earlier for use in iterative processes for the roots of the equation 
2x 3 — 7x + 2 = 0. 

(i) For the formula x n+1 = j(xjj +1) 

Fi(x) = j(x 3 +1) and so Fi(x) = fx 2 . 

Hence for values of x in an interval containing the smaller positive root 
(e.g. the interval [0, 1]) F'^x) < 1 and so this formula can be used with 
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starting value x 0 anywhere in this interval to determine this root. We have 
already applied this formula successfully taking x 0 = 1. The process is 
illustrated in Figure 12 a. Note that FT (0*2928 ...) = 0*0735 ... is very 
much less than 1 and the convergence of this process was rapid. However 
it is clear that if X represents the larger positive root (~ 1*7) then 
| F\(X)\ > 1 (F\(x) > 1 for all x > J® ^ IT) and so the formula 
x n+1 = F t (x n ) would not give a convergent process for this root no matter 
what starting value x 0 is used. We have seen that taking x 0 = 2 resulted in a 
sequence of iterates which diverged. The process is illustrated graphically 
in Figure 12 b. 




Figure 12 


(ii) For the formula x„+i = 2 

Zx n 

lx — 1 A —lx 

Fi{x)= and F ' 2{x)= ~2x r - 

It can be deduced that |F' 2 (x)| > 1 for all x in [0, f] and so in particular 
for the smaller positive root of the given equation. Hence the formula 
x„+i = F 2 (x„) is unsuitable (regardless of starting value x 0 ) for determining 
the smaller root. We have seen that taking xo = 1 the process did not con¬ 
verge to this root. The process is illustrated in Figure 13a. 

It can however be shown that |F 2 (x)| < 1 for all x in [T5, 2] and so for 
any starting value x 0 in this interval the formula x „+1 = F 2 (x„) will give 
a convergent process for the root of the given equation which lies in the 
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interval [1*5,2]. We have already applied this formula successfully taking 
x 0 = 2. The process is illustrated graphically in Figure 13b. 



(hi) 


For the formula 


F 3 (x) 


fix —2 
\ 2x 


*n+l 

\V 2 



1/2 


and 



The above formula cannot be applied in the interval 0 < x < j since it 
would involve the square root of a negative number. However it can be 
shown that for all x in the interval (f, j]* which includes the smaller root 
of the given equation, |F 3 (x)| > 1. Thus in particular |F 3 (x)| > 1 at the 
smaller root. Hence the formula x„+i = F 3 (x„) is unsuitable (regardless 
of the starting value x 0 ) for determining the smaller root. 

It can be shown that |F 3 (x)| < 1 for all x in the interval [f, 2] which 
contains the larger root. Hence for starting values in [§, 2] the formula 
x „+1 = F 3 (x„) will give a convergent process for the larger root. 


(iv) For the formula 


x 


M + 1 ~ 


7X 2 ~ 2) 1/3 


FJx) = 



and 



2 V' 3 

lx—2 J 


x in the interval (a, b] implies that a < x ^b. 
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It can be shown that | F 4 (x)| > 1 for all x in (f, f] and so in particular is 
greater than 1 at the smaller root. Hence the formula x„+i = F 4 (x„) is 
unsuitable (regardless of the starting value x 0 ) for determining the smaller 
root. 

It can also be shown that |F 4 (x)| < 1 for all x in the interval [§, 2] which 
contains the larger root. Hence for starting values x 0 in [§, 2] the formula 
x „+1 = F 4 (x„) will give a convergent process for the larger root. 

These results are summarized in the following table which should be 
compared with the table on page 90. 


Iteration 

Formula F'(x) 


x„+i = ?(x 3 + l) fx 2 


X„+l 


7x„ —2 
2x 2 


4 —7x 
2x 3 


X„+l 


f 7x„ —2\ 1/2 

_L 

( 2x \ 

V 2x„ J 

2x 2 ‘ 

\lx — 2J 


X„+l 


7x„-2 y* 


If 2 Y 3 

6\7x-2j 


Interval 


I 

|F'(x)| 

Results 

[0, i] 

< 1 

Converges to the 
root in / for any 
starting value in /. 

[11, oo] 

> 1 

Will not converge to 
the root in / no 
matter what starting 
value is used. 

[0, i] 

> 1 

Will not converge to 
the root in / no 
matter what starting 
value is used. 

[§,2] 

< 1 

Converges to the 
root in / for any 
starting value in I. 


> 1 

Will not converge to 
the root in / no 
matter what starting 
value is used. 

1 — 1 
NJ|U» 

TO 

l_l 

< 1 

Converges to the 
root in I for any 
starting value in /. 

».i] 

> 1 

Will not converge to 
the root in / no 
matter what starting 
value is used. 

[1,2] 

< 1 

Converges to the 
root in I for any 
starting value in /. 
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Example. Determine an approximation to the only real solution of the 
equation /(x) = 0 in which f(x) = x 3 — x 2 — x — 3. 

From the table 


X 

-3 

-2 

-1 

0 

1 

2 

3 

4 

fix) 

-36 

-13 

-4 

-3 

-4 

-1 

12 

41 


we can obtain a simple sketch of the curve represented by the given 
equation and from this we see that the required root must lie between 
x = 2 and x = 3. This is, of course, clear from the table itself. Since 
/(2) = -1 and /(3) = 12 the curve must cross the x-axis somewhere 
between 2 and 3; that is, there must be a root, X say, of /(x) = 0 in this 
interval. 

Rewriting the equation /(x) = 0 in the form 

x — x 3 —x 2 —3 

we consider the formula x „ + 1 = Fi(x„) 

where F i(x) = x 3 — x 2 — 3. 

Then F'i(x) = 3x 2 — 2x. 

For values of x in the interval [2, 3] the value of Fi(x) increases 
steadily from 8 to 21. Hence |Fi(x)| > 1 for all x in [2, 3]. Therefore this 
formula is not suitable for the determination of the root. 

Now rewrite the equation /(x) = 0 in the form 

i 1 3 

X = 1H-1-2 

x x 

and consider the formula x n+l = F 2 (x n ) 

1 3 

where F 2 (x) = 1H-\—y • 

x x 

1 6 

Then F 2 (x) =—y—r- 

X X 

For values of x in the interval [2, 3] the value of F' 2 (x) increases 
steadily from — 1 to — j. Hence |F 2 (x)| < 1 for all x in the interval (2, 3]. 
Therefore the formula x„+i = F 2 (x„) with starting value x 0 any value in 
(2, 3] will give a convergent process for the required root X. However if 
this root is close to 2, and this could also be deduced from the simple 
sketch of the curve /(x) = 0, then, since |F' 2 (2)| = 1, we might expect that 
the convergence of this method would be very slow. 
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Taking x 0 = 2-5 and rounding where necessary to three decimal places 
we obtain the results in the following table. 


n 

x„ 

1 

x„ 

1 

x 2 

(i+-+ 

V X n . 

0 

2-5 

0-4 

0-16 

1-88 

1 

1-88 

0-532 

0-283. 

2-381 

2 

2-381 

0-420 

0-176 

1?48 

23 

2-125 

0-471 

0*221 

2-134 

24 

2-134 

0-469 

0-220 

2-129 

25 

2-129 

0-470 

0-221 

2-133 

26 

2-133 

0-469 

0-220 

2-129 


(Note that the first and last columns in this table could be omitted.) 
The process now gets itself into a loop, if the calculations are still rounded 
to three decimal places, giving alternately the values 2-133 and 2-129. 
Hence we suspect that the given root lies somewhere between 2T33 and 
2T29, that is, that to two decimal places (or three significant figures) the 
root is 2T3. If more accuracy is required then the process must be 
continued retaining more figures in the calculations. 

Now rewrite the given equation as 

x = (x 2 + x + 3) 1/3 

and consider the formula x „+1 = F 3 (x„) 
where F 3 (x) = (x 2 + x + 3) 1/3 . 

Then F3(X) = 3(x 2 + x + 3) 2/3 • 

In the interval from 2 to 3 the numerator of F 3 (x) has maximum value 
7 while the denominator has minimum value 3(9) 2/3 . Hence the maximum 
value of 


n(x) < 3(9) 2/3 “ 3(81) 1/3 < 3(4) < U 

that is |F 3 (x)| < 1 for all x in [2, 3] and so the formula x „+1 = F(x„) with 
starting value x 0 anywhere in this interval will give a convergent process 
for the root which lies in this interval. 

n(2)= “ 0,385 


Also, 
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which is considerably less than F' 2 ( 2) and so, if the required root is close 
to 2 we might reasonably expect that |F' 3 (A)| < |F' 2 (A)| so that the process 
with formula x n+1 = F 3 (x„) should converge faster than the process with 
formula x n+ 1 = F 2 {x n ) to the root A. Taking x 0 = 2-5 and rounding where 
necessary to three decimal places we obtain the results in the following 
table. 


n 

Xn 

x 2 

(x 2 4- x„ + 3) 

0 

2-5 

6-25 

11-75 

1 

2-274 

5-171 

10-445 

2 

2-186 

4-778 

9-964 

3 

2-152 

4-631 

9-783 

4 

2-139 

4-576 

9-715 

5 

2-134 

4-554 

9-688 

6 

2*132 

4-546 

9-678 

7 

2*131 

4-541 

9-672 

8 

2-131 




Thus, as expected, this method has converged considerably faster than the 
other method. 

A very important point concerning iterative processes must be stated. 
This is that small errors in the calculations are not too important in that 
they will not normally prevent the process from converging to the required 
root. They may of course result in it being necessary to repeat the iteration 
process a few more times to obtain convergence. If we are very fortunate, 
they may even reduce the number of iterations required to obtain con¬ 
vergence ! Thus, when carrying out an iterative process, it is not normally 
necessary to incorporate any checks into the computations. Also, if a result 
is required to a large number of significant figures, then it is not necessary 
(and indeed is wasteful of effort if a computer is not being used) to use this 
number of significant figures in all the calculations right from the 
beginning. For example, we might start by using only 2 significant figures 
until the process has converged to this accuracy, and then start using 4 
significant figures and later 6 significant figures, and so on. 

It is also important to point out that although an iteration process 
x n +i = F(x n ) with starting value x 0 to determine a value A may converge, 
when working to a given number of decimal places, to a value X such 
that X = F(X) to this number of decimal places, it does not follow that 
X approximates A to the same number of decimal places. Thus, if a root is 
required to say d decimal places, then the iterative process should be 
continued until convergence is obtained to at least (d+1) decimal places. 
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7.3.1. Flow diagram 
A simple flow diagram for carrying out the process of iteration is given 
below. In this flow diagram t is a measure of the accuracy to which we 
want our final answer. The value of OLD X which is read in is the starting 
value x 0 for the iteration. NEW X is computed by evaluating the function 
F at OLD X. 



This flow diagram will be quite satisfactory so long as the iterative 
method being used converges. What will happen if the method does not 
converge? The reader should modify the flow diagram to take care of the 
case when the method being used does not converge. 

EXERCISES 

1. In exercise 1 in the last section we saw that the formula x„+i = j— with 
starting value 0 gave a convergent iterative process for a root of the equation 
x 3 + x _i = o Determine an interval / of the root such that, using the above 
formula and any starting value x 0 in /, we can guarantee convergence to the root. 

2. Would you expect the formula x n+1 = (i-xj/xj to give a convergent process 
for the root of the equation in exercise 1 for a sufficiently accurate starting 
value x 0 ? 
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Give reasons for your expectation. 

Take x 0 = 0*4 and evaluate the next three iterates xi, x 2 , x 3 . Work to three 
decimal places. 

(Hint: Consider the interval (0, i] containing the root). 

3. Would you expect the formula x n+1 = (i-x„) 1/3 to give a convergent process for 
the root of the equation in exercise 1 for a sufficiently accurate starting value x 0 ? 

Give reasons for your expectation. 

Take x 0 = 04 and evaluate the next five iterates Xi, x 2 , x 3 , x 4 , x 5 to three 
decimal places. 

(Hint: Consider the interval [^, %) containing the root.) 

4. In exercise 2 in the last section we saw that the formula x n+ i = 1 -sinx„ with 
starting value 1 gave a convergent iterative process for a root of the equation 
sin x + x — 1 = 0. Determine an interval I of the root such that using the above 
formula and any starting value x 0 in I we can guarantee convergence to the root. 

5. Show that the equation x 3 -10x-}-4 = 0 has one root in each of the intervals 
[-4, -3], [0, 1] and [2, 3]. 

Show also that the formula x„ +1 = +4) with starting value x 0 chosen in 

the interval [0, 1] will give a convergent process for the root in this interval. 

Is the above formula suitable for determining either of the other two roots? 

Obtain an iteration formula which, when applied with any starting value in 
the interval [-4, - 3], will converge to the root in that interval, and which when 
applied with any starting value in the interval [2, 3] will converge to the root in 
that interval. Determine each of the three roots correct to 3 significant figures. 

6. Show that the equation sinx —x+i = 0 has a root in the interval (0, jn). 

Determine the root of the given equation correct to four significant figures. 


7.4. Order of iterative processes 

We have seen that, if we are given an equation f(x) — 0, then it may be 
possible, by rewriting the equation in the form x = F(x) for different 
functions F, to obtain several different iterative processes associated with 
the given equation. We have also seen that some of these methods will 
converge to a particular root A of f(x) = 0 for sufficiently accurate starting 
values, while others will not converge to A, no matter how good a starting 
value is used. In this section we shall consider in more mathematical detail 
the convergence of an iterative process, and this will lead us on to consider 
ways in which the convergence can be speeded up so that it is not necessary 
to repeat the process so often to obtain a particular root. 

If A is a root of the equation f(x) = 0 then /(A) = 0. Then, if the iterative 
formula x n+1 = F(x n ) is to be used to determine the root A, we must have 
A = F(A). 

Let e n = x n —A, that is — e n represents the error in the nth iterate x n . 
Then x„ = A + e„ 
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Xn+ 1 A “f" Cfi + 1. 

X „+1 = F(x„) - F(i + e„) 

= F(A) + e II F(A)+i^a)+---, 
on using Taylor’s series expansion.* 

Hence A + +1 = F(A) + e n F\X) + i<? 2 F'(A) + ••• 

= A + e n F\X) + ^e 2 F'(A) + • • •. 

Therefore e n+1 = e„F(A)+^e 2 F'(A) + * • *. 

Now, if the process is convergent, there must be a stage in the iteration 
beyond which each successive error e n +i must be numerically less than 
the modulus of the previous error e n ; that is, | e„ +1 1 < | e n | and the successive 
iterates are getting closer and closer to the value of the root A. 

Then, when n is sufficiently large to make e n “small”, we have 

e n+1 ~e n F(X) if F(A)^0. 

In this case the process is said to be first order . Since for such a process, 
once n is large enough, an approximation to the error at any stage is 
obtained by multiplying the error at the previous stage by the factor F(A), 
it follows that a necessary condition for convergence is that |F(A)| < 1. 
Also, the rate of convergence will increase as |F(A)| decreases. Hence if 
|F(A)| is just less than 1, the convergence will be very slow and it will be 
necessary to repeat the process many times to obtain an accurate value 
of A. If |F(A)| is very much less than one, the convergence will be faster, 
and it should not be necessary to repeat the process so often in order to 
obtain the same final accuracy. 

If F(A) = 0 then we have 


e n +i — ie%F"(A) if F(A) # 0. 

In this case the process is said to be second order. For such a process an 
approximation to the error at any stage is obtained by squaring the error 
at the previous stage and then multiplying by the factor iF(A). Thus, once 
we have got close enough to the desired root, so that the error in our 
approximation is “small” (that is, less than one unit on some scale of 
measurement) the errors in the subsequent iterates will decrease very 
rapidly. 

In a similar way, third and higher-order processes can be defined. 


* See Appendix 1. 
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Now first-order processes have the advantage that they are generally 
associated with very simple formulae. Their disadvantage, however, is 
that their convergence will be relatively slow (compared with higher-order 
processes) so that, although the work involved in carrying out one step of 
the iteration may be quite small, it will generally be necessary to repeat 
this many times. Second-order processes on the other hand will generally 
involve more complicated formulae, so that the work involved for each 
step of the iteration is increased. Their faster convergence rates, however, 
mean that this will not have to be repeated so often. The formulae for 
third and higher-order methods are generally so complicated that the extra 
gain due to their even-faster convergence rates is more than offset by the 
added labour of computation at each step. Thus, in general, only first and 
second-order methods are widely used in practice. 

Example. Consider the methods which have already been found to 
converge (for suitable starting values) to the positive roots of the equation 
2x 3 — 7x + 2 = 0. 

Let Ai be the root between x = 0 and x = 1 and let k 2 be the root between 
x = 1 and x = 2. 

(i) The method x „+1 = f(x 3 +1) = F i(x„) 

converges, for suitable starting values, to Ai. 

Fi(A i) = jAj ^ 0 since A x # 0. 

Hence the method is first order for the root Ai. 

lx —2 

(ii) The method x „+1 = - - 2 = F 2 (x„) 

£x n 

converges, for suitable starting values, to A 2 . 

4._7 q _ 

F ^=~2xr^ 0 if 

Hence the method is first order for the root A 2 ( ^ y). 

fix -2\ 1/2 

(hi) The method x n+ , = ( J = F 3 (x„) 

converges, for suitable starting values, to A 2 . 

F'n i 1 ( 2Az V /2 1 

3( 2) 2X\ \7A 2 —2/ (21i(7/l 2 —2)) 1/2 # ' 
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Hence the method is first order for the root A 2 . 


(iv) The method 


x n + i = 


7x„ —2 


1/3 


= F 4 (x„) 


converges, for suitable starting values, to X 2 . 


It 2 
F\{k 2 ) = -[ 


2/3 


# 0 . 


6\7A 2 -2y 

Hence the method is first order for the root A 2 . 

Example. Assuming that the iterative process defined by the formula 
x „+1 = %(x n + a/x n ) in which a is a positive constant, and a given starting 
value x 0 converges, what will be the limiting value of x n as n becomes very 
large? 

Show that the method defined above is second order. 

If the process converges to the value 2, then we must have 

1=i ( A+ i 

(that is A = F(A) in the notation of this chapter). 

a 


Hence 


2a — A + 


2 


and so A 2 = a. 

Thus the process will converge to a value A which is the square root of a. 


Now 

Therefore 


F(x) = ±[x + - . 


F\x) = \ 1- 


and so 


F(A) = i( 1 ) = 0 since = a ■ 

A 


Hence the given method is at least second order. 

a 


and so 


F"i*) = x3 

a ... 

P"(A) = -3 # 0 since a is positive. 
A 


Hence the given method is second order. 
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1. Show that the following convergent iterative processes are both first order: 

(i) x„+i = 7 -x 3 with x 0 e [ 0 , 7 ] for a root of x 3 +x-i = 0 , 

(ii) x n +1 = 1 — sin x„ with x 0 e ( 0 , n) for a root of sin x+x — 1 = 0 . 

2. Show that the following convergent iterative processes for roots of the equation 
x 3 — 10x + 4 = 0 are first order 

(i) x „+1 = to(Xm + 3) with x 0 e [0,1], 

(ii) x„+i = (10x„—4 ) 1/3 with x 0 e[2, 3]. 

3. The iterative formula x „ +1 = jx„(3-axj) can be used to determine 1 /fa. Show 
that the process is second order and apply it to obtain l/fl correct to six decimal 
places. (Check your result against that obtained from tables.) 

7.5. The rule of false position 

Once again consider the determination of a root of the equation 

f(x) = 0. 

The rule of false position is best explained geometrically. In Figure 14 is 
sketched a part of the graph of the curve represented by the equation 
y = fix). The root of fix) - 0 which we are seeking is represented by the 
x-coordinate of the point P, the intersection of the curve represented by 
y = fix) with the x-axis. Using the rule of false position we start with two 
points A(a, /(a)) and B 0 (x 0 , fix 0 )) which are preferably on opposite sides 
of the x-axis and determine the x-coordinate, xi, of the point P i which is 
the point of intersection of the x-axis and the straight line AB 0 . 



Now, starting from points A and Bi(xi, fix 0) (on the curve y = fix)), 
we can determine xz, the x-coordinate of the point P 2 the point of inter- 
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section of the x-axis and the straight line ABi. Thus we can generate a 
sequence of values x 0 , x v x 2 , . . . which we hope will converge to the 
x-coordinate of the point P, that is, to the required root of the given 
equation. 

The method is, of course, not applied graphically but analytically. An 
equation for the line AB 0 is 


y-m 


/(*<>)-/(«) 
x 0 — a 


{x — a). 


Now this passes through Pi the point (xi, 0) 
and so 


f , , fix o)-/(a) 

/(a) = —~-r— (*i ~ 4 


Xq — CC 


Hence 

Similarly 


«/(xo)-x 0 /(a) 

f(xo)—f(ot) 

a/(xi)-x 0 /(«) 

f{xi)-f{ot) 


and in general, for n ^ 0, we have 


_ Gcf(x n ) — x n f(cc) 

Xn+1 ~ fM-m * 


Now regarding the point A as fixed, and so a and /(a) as constants, we 
see that the rule of false position is equivalent to the iterative process given 
by the formula 

y _ «/(*„)- 
n+1 m-m 

(which is of the form x n+ i = F(x n )) and a starting approximation x 0 . 

Choosing a different point A on the curve is equivalent to choosing a 
different constant a (and so a different constant /(a)) and leads to a different 
iteration formula. 

Note, from the form of the expression for x B+1 , that x„+i is obtained 
by linear interpolation between the points (a, /(a)) and (x„, /(x„)). 


Now, in our notation, 


a) 


and so 


(fix) - /(«))(«/'(*) - /(«)) - («/(*) - xf(a))f'(x) 
(f(x)—f(<x)) 2 
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Hence if X is a root of /(x) = 0 (and so f{X) = 0), 

_ /(«)(/(«) 

Fm -«7mF 

m-am+km 

m 

if /(a) # 0 . 

Therefore, in general, F(A) # 0 and so the rule of false position is 
equivalent to a first-order iterative method. 

Example. Use the rule of false position to determine approximations to 
the positive roots of the equation 

2 x 3 —7x + 2 = 0 . 

(Use three decimal places in the calculations.) 

Let fix) = 2x 3 — 7x + 2. 

We have seen already that there is a root of f(x) = 0 between x = 0 
and x = 1. Thus we shall take a = 0 and x 0 = 1. Then, since /(0) = 2, 
the formula for the iterative method corresponding to the rule of false 
position (with this choice of a) is 


if x„ # 0. 


— 2x„ _ — 2x„ 

/(x„)-2 2x 3 — 7x„ 


-2 


2x„ 2 -7 


Hence in our notation 


F(x) = 


-2 


2 x 2 —7 


8 x 

and so fW "(2 ?^7 F' 

Now 8x increases steadily from 0 to 8 and (2x 2 — l) 2 decreases steadily 
from 49 to 25 as x increases from 0 to 1. Thus the maximum value of 
F(x) for x in [0,1] occurs at x = 1 and is Therefore |F(;>c)| < 1 for x in 
[0, 1] and so the iterative process is convergent for any starting value x 0 
in [0, 1] and in particular for Xo = 1. 
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With x 0 = 1 the calculation proceeds as follows: 


n x n 2xl 2x1 — 1 


0 

1-000 

2-000 

-5-000 

1 

0-400 

0-320 

-6-680 

2 

0-299 

0-179 

— 6-821 

3 

0-293 

0-172 

-6-828 

4 

0-293 




Hence the process has converged (rounding to three decimal places). For 
a better approximation to the root (whose exact value is 0*292893 ...) the 
process must be continued retaining more significant figures in the 
calculations. 

Since the second positive root is known to lie between 1 and 2, let us 
consider the process obtained by taking a = 2 and xo = 1. Since /(2) = 4 
the corresponding iteration formula is 


if x n ^ 2. 
Now 


_ 2/(x„)-4x n 

X " +1 /(*„)-4 

4x^ — 18x„ + 4 
2Xn—lx n — 2 

= (x w -2)(4x^ + 8x n -2) 
(x„ — 2) (2x 2 + 4x„ +1) 

= 2 {2x„(x„ + 2)-l} 
2 x M (x M + 2)+l 


2 {2x(x + 2)-l} 
1 } 2x(x + 2) +1 


and so, after differentiating and simplifying we obtain 


F\x) = 


16(x +1) 
(2x 2 + 4x + l) 2 ’ 


Now for x in [1, 2] the maximum value of the numerator occurs at x = 2 
and is 48 and the minimum value of the denominator occurs at x = 1 and 
is 49. Hence the maximum value of F'(x) for x in [1,2] is certainly less than 
|f. Thus |F'(x)| < 1 for x in [1, 2] and so the process is convergent for any 
starting value x 0 in this interval and in particular for x 0 = 1. 
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With x 0 = 1 the calculation proceeds as follows: 


n x n 2x„(x n + 2) 


0 

1-000 

6-000 

1 

1-429 

9-800 

2 

1-630 

11-834 

3 

1-688 

12-451 

4 

1-703 

12-612 

5 

1-706 

12-645 

6 

1-707 

12-656 

7 

1-707 



Hence the process has converged (rounding to three decimal places). 
For a better approximation to the root the process can be continued 
retaining more significant figures in the calculations. 


EXERCISES 

1. Use the rule of false position to determine the positive roots of the equation 
x 3 — 10x + 4 = 0 correct to three significant figures. (Note, we have already seen 
in exercise 5, page 99, that the positive roots lie in the intervals [0,1], [2, 3].) 

2. Use the rule of false position to determine the root of the equation sin x + x - 1 = 0 
correct to two significant figures. (Note it is easy to see that the root lies in 
[ 0 , tc].) 

7.6. Aitken’s <5 2 -process 

We have seen that using simple (or first-order) iteration convergence can 
be very slow indeed and that using second- or higher-order (and generally 
more complicated) methods should result in faster convergence. Before 
considering second-order processes, however, we will look at a method by 
which the convergence of a first-order method can be speeded up. The 
method, known as Aitken’s <5 2 -process, is of wide applicability and can be 
used to speed the convergence of any (convergent) sequence of numbers 
and not simply those obtained from iterative processes. 

For a convergent first-order process x„+i = F(x n ) for the root A we 
have as in section 7.4, 

e n + i = F'(F)e„ + \F"(k)el +... 

= e„(A + E„) 

where A = F'(A) and E n = \F"(A)e n + ... and E n can be made as small as 
we please by making n large enough. We cannot calculate the constant 
A, as A is as yet unknown. 
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Then, in the same notation, e n+2 = e n +i(A+E n+ i). 
Hence, eliminating A , we obtain 


&n +1 ^n +2 


£«+l 


= E n -E n+ x - 0 


for suitably large values of n . 


Therefore 
and so 

Therefore 
and so 


^n + 1 2 - 0 

(2-x„ + i) 2 -(2-x„)(2--x, i + 2 ) - o. 


A (x„ + 2 ' + 1 + X„) - X„X„ + 2 ~ X, 


n+1 


2 ~ 


x n +2 2 x w + 1 T x n 


This estimate of X will be better than any of x n , x„+i, x„+ 2 . Now it is 
easily verified that this estimate can also be written 

(* n+ i -*„) 2 


x M — 


*n + 2 2x n + i + X n 


The numbers (x„+ 2 — x„+ 1 ) and (x„+ 1 - x n ) are the first central differences 
dx n+ 3 /2 and Sx n+1/2 respectively of x. Then the number 

{(X n + 2-*„ + l)-(*„+I”**)} = { dx n + 3/2- Sx n + l/2} 

is the second central difference <5 2 x w+ x of x. Thus the above result can now 
be written 

(<5*„ + i /2 ) 2 


x„- 


(S 2 x n+1 ) 


Hence the name “<5 2 -process”. The method of application is as follows: 

From a starting value x 0 compute the first iterate Xi = F(x 0 ) and thence 
the second iterate x 2 = F(xi). Now calculate x^ where 

X 2 X 0 -Xi (^X 1/2 ) 2 

= ^ = --• 

X 2 + X 0 —2X! 0X 1 

(The best way of calculating this is by first calculating the differences 
Sx 3/2 and 5x lt2 and then the difference S 2 x 1 .) 

Now, using xk 1} as a starting value, calculate the next two iterates 
x^ 1} = F(x\> } ) and x^ 1} = F(x\ 1) ). Then calculate 




where dxVA = and (SM 1 ’ = (xi 1, -xi 1) )-(xi 1) -xi> 1) ). 
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AITKEN’s d 2 - PROCESS 

In this way generate the sequence {x^}. This sequence will then tend to 
the root X faster than the (first-order) sequence {x„}. It can, in fact, be 
shown that using the <5 2 -process along with a first-order method is 
equivalent to using a second-order method. The calculations are con¬ 
veniently recorded in tabular form as illustrated in the following example. 

Example. Consider again the determination of an approximation to the 
real solution of the equation x 3 — x 2 — x — 3 = 0 using the method 

i 1 3 

x„+ 1 = 1 H-2 

X„ x„ 

and Aitken’s <5 2 -process (see page 95). 

As before we shall start with x 0 = 2*5 and round-off* to three decimal 
places. We then obtain : 


X 

Sx 

S 2 x 

(<5x) 2 

5 2 x 

xg» 

2*5 

1*88 

2*381 

-0*62 

0*501 

1*121 

0*343 

2*157 

2*157 

2*108 

2*150 

-0*049 

0*042 

0*091 

0*026 

2*131 


2*131 

2*130 

2*131 


Hence the result 2*13 to two decimal places has been obtained very much 
faster than when using the same method without the £ 2 -process. 

Using the <5 2 -process along with the method x „+1 = (x 2 + x„ + 3) 1/3 and 
again taking x 0 — 2*5 we obtain: 


x 


Sx 


d 2 x 


m 2 

S 2 x 


xi) } 


2*5 

2*274 

2*186 


-0*226 

-0*088 


0*370 2*130 

0*138 


2*130 

2*130 
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ITERATION 


Once again the result 213 to two decimal places has been obtained faster 
than when using the same method without the <5 2 -process. 

Example. Determine an approximation to the positive root of the 
equation f(x) = 0 where /(x) = x — 2 sinx, that is, of the equation 
x — 2 sin x = 0 (in which x is measured in radians). 

If rough graphs of the curves y = x and y = 2 sin x are drawn (Figure 
15) we readily observe that they intersect, and so x = 2sinx (or 
x — 2 sin x = 0) for a value of x greater than But since the given 
equation implies that sinx = \x and sinx is less than or equal to 1, we 
see that the root must be less than or equal to 2. Hence, if X is the required 
root, then < X ^2. 

Now the given equation can be rewritten in the form x = 2 sin x and so 
we shall consider the iterative method given by x„+i — F(x n ) where 
F(x) = 2 sin x. Then F'(x) = 2 cos x and so F'(X) / 0 since < X ^ 2. 
Therefore the above iterative method is first order. Also |F'(x)| < 1 for 
< x ^ 2 and so we have a neighbourhood of the root for which 
|F'(x)| < 1. Hence the above iterative process will converge to the root 
for sufficiently accurate starting values. Taking x 0 = 2 and using Aitken’s 
<5 2 -process along with the above iterative method we obtain the following 
results rounded to three decimal places: 


X 


5x 

S 2 x 

(dx) 2 

S 2 x 


2 sin 2 = 

2 

1-819 

-0-181 

0-120 

0-301 

0-109 

1-891 

2 sin 1-819 = 

1-939 




2 sin 1*891 = 

1-891 

1-898 

0-007 

-0-004 

-0-011 

-0-004 

1-895 

2 sin 1-898 = 

1-894 





1-895 

2 sin 1-895 = 1-896 
2 sin 1-896 = 1-895 


Hence we suspect that the required root lies somewhere between 1-895 
and 1*896. To three-figure accuracy then we have X ^ 1*90. Greater 
accuracy can, of course, be obtained by continuing the process and 
retaining more significant figures in the calculations. (This would require 
the use of better tables.) 
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EXERCISES 


1. Use the iterative formula x„+i =\—with starting value 0 and Aitken’s 
£ 2 -process to obtain the real root of the equation x 3 + x — \ = 0 correct to three 
decimal places. 

2. Use the iteration formulae x n +i = 1 — sinx„ with starting value 1 and Aitken’s 
<5 2 -process to obtain the real root of the equation sin x + x — 1 =0 correct to two 
decimal places. 

3. Use the iteration formula x „+1 = (10x„-4) 1/3 with starting value 2 and Aitken’s 
<5 2 -process to obtain the root of the equation x 3 - lOx-f 4 = 0 which lies in the 
interval [2, 3] correct to two decimal places. 

4. Use the iteration formula x„+i = i^(xJ+4) with starting value 0 and Aitken’s 
<5 2 -process to obtain the root of the equation x 3 — lOx 4-4 = 0 which lies in the 
interval [0, 1] correct to three decimal places. 

5. In the notation used in the text, show that 


x\} } = x 2 - 


(Xn + 2 X n + i) 

Xu +2 2x n + l H~ X n 


= x 2 - 


(<5x„ + 3 / 2) 2 
<5 2 x„+i 


6 . Construct a flow diagram for Aitken’s <5 2 -process. 


7.7. The Newton-Raphson Method 

In this section we shall consider a very important iterative method which 
is, in general, second order. 

Let x 0 be an approximation to the root X of the equation f(x) = 0 and 
let e 0 be the error in this approximation so that 

X = Xq + Pq- 
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ITERATION 


Then 0 = /(A) = f(x 0 + e 0 ) 

= f{xo) + eof\xo)+yir\xo) + ... 

on expanding in a Taylor series.* 

Hence, provided e 0 is “small”, 

f(xo) + e 0 f'(x 0 ) ^ 0 


and so 


e 0 ^ ~ 


f(x o) 
f\x o) 


provided /'(x 0 ) # 0. 

Thus an improved approximation to the root A is 




= x 0 - 


/(*o) 

/'(* o) 


and then a further improved approximation is 


X 2 = Xi — 


/(*l) 


provided e l cx 


4^4 is “small”, 

/'(x,) 


Hence we have the iterative formula given by 


x„+i = x„ 


fjx n ) 

f\x n ) 


or 

where 


x„+i = F(x n ) 


F(x) = x — 


fix) 

f\xY 


7.7.1. Geometrical interpretation 

In Figure 16 OA 0 = x 0 and B 0 = (x 0 , /(x 0 )) so that A 0 B 0 = /(x 0 ). 
AiB 0 is the tangent to the curve y = f(x) at the point B 0 on it. Therefore 
the gradient of the line AiB 0 is f\x 0 ). 


Hence 

and so 


fix 0 ) = 


A 0 B 0 

Ai Ao 


f(x o) 
Ai A 0 


Ai A 0 


fix o) 
f'ixoV 


See Appendix 1. 
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Therefore x i = x 0 - = OA 0 - A, A 0 = OAj 

that is, x x is obtained by determining the intersection with the x-axis of 
the tangent to the curve y = f(x) at the point (x 0 , /(x 0 )) on it. x 2 is 
obtained in a similar way, starting from the point (xi, /(xi)). In general, 
x n +i is obtained by determining the intersection with the x-axis of the 
tangent to the curve y = /(x) at the point (x„, /(x„)) on it. 

Example. Use the Newton-Raphson method to determine an approxi¬ 
mation to the real root of the equation x 3 — x 2 — x — 3 = 0. 

Using the above notation we have /(x) = x 3 — x 2 — x — 3 and so the 
Newton-Raphson method is given by 

f(x») _ . xi! —x%—x„ — 3 

Xfl+ 1 X H Xfj 2 r-s 1 

f(x„) 3x„—2x n — 1 

Starting with x 0 = 2-5 (as in the example on pages 95 and 109) and 
rounding to three decimal places throughout we obtain the following 
results: „ , 


X 

m 

/'(*) 

m 

rw 

2-5 

3*875 

12-75 

0-304 

2196 

0-571 

9-075 

0-063 

2133 

0-022 

8-383 

0-003 

2130 

-0003 

8-351 

0-000 
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ITERATION 


Hence the root is 2-13 correct to two decimal places. The number of 
iterations required here should be compared with the number required 
to obtain the same root in the examples on pages 95 and 109. 

If greater accuracy is required in the result, then the process must be 
continued, retaining more significant digits in the calculations. The process 
for this equation is illustrated graphically in Figure 17. 


y 




In the above example all the calculations were carried out to three 
decimal places. Clearly this was a little wasteful of effort since, for example 
at the last step, there was no need to calculate f'(x ) to four significant 
figures (that is, in this case three decimal places) when f(x) was only 
known correct to one significant digit. One significant figure would have 
been sufficient for f'(x). Also, at the previous step, it would have been 
sufficient to have evaluated f'(x) correct to only one decimal place, or two 
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significant figures since /(x) has only been evaluated to two significant 
figures. 

Example. Use the Newton-Raphson method to determine an approxi¬ 
mation to the positive root of the equation x — 2 sin x = 0. 

Here the method is given by 

x„ —2sinx„_ f(x n ) 

An+l X„ /v/ \ * 

1 — 2cosx„ j (x„) 

Starting with x 0 = 2, we obtain the following results: 


X 

2 sinx 

2cosx 

f(x) 

fix) 

fix) 

fix) 

2 

1-819 

-0-832 

0-181 

1-832 

0-099 

1-901 

1-892 

-0-648 

0-009 

2 

0-004 

1-897 

1-894 

-0-6 

0-003 

2 

0-002 

1-895 

1-896 

-0-6 

-0-001 

2 



Hence (as before) we obtain the value 1 -90 correct to two decimal places 
for the required root. If greater accuracy is required, the process must be 
continued, retaining more figures in the calculations. 


7.7.2. Convergence of the process 

Like other iterative processes the Newton-Raphson process has the form 
x n +i = F(x n ) and will not always converge. For this method 


F(x) — x — 


/(*) 

f'(x) 


and so 


F(x)= 1 — 


r(x)f f (x)-f(x)f"(x) _ f(x)f"(x) 


[/'(*)] : 


U’( X )V 


Hence to guarantee convergence, for any x 0 in an interval I containing 
the root X, we require that 

mm 

[/'W ] 2 

for all x in this interval (and that F maps the interval I into itself). In 
practice, however, it is often difficult to determine such an interval, and the 
following set of conditions can be shown to be sufficient to ensure con¬ 
vergence for any starting value x 0 in an interval a ^ x ^ b. 
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ITERATION 


(i) mm < o. 

This implies that f(a) and f(b) have different signs so that the curve 
represented by the equation y = f(x) crosses the x-axis at least once in the 
given interval, and so the given equation has at least one root in this 
interval. 

(ii) /'(x) / 0 for a ^ x ^ b. 

This implies that there are no stationary points of the curve y = /(x) 
in the given interval and so there is only one root of the equation /(x) — 0 
in this interval. 

(iii) f”{x) is either ^ 0 or ^ 0 for all x such that a < x < b. 

Thus /"(x) does not change sign in the given interval and so the curve 
represented by the equation y =f(x) is either concave up over the whole 
interval or concave down over the whole interval. 


(iv) 


m 


f\c ) 


< b — a where c denotes a if \f\a)\ < \f\b)\ and c denotes b if 


This implies that the tangent to the curve represented by the equation 
y = /(x) at the end of the interval a^x^b at which |/'(x)| is 
smallest intersects the x-axis within the above interval, so that successive 
iterates always lie within this interval. 


Example. Consider the application of the Newton-Raphson method to 
determine (a) the root of x 3 — x 2 —x — 3 = 0 which lies between 2 and 3 
(see the example on page 113) and ( b ) the root of x — 2 sinx = 0 which 
lies between and 2 (see the example on page 115). 

(a) We have /(x) = x 3 — x 2 — x — 3. 

(i) /(2) = -1, /(3) = 12 and so /(2)/(3) < 0. 

(ii) /'(x) = 3x 2 — 2x -1 = (3x + l)(x — 1) and so f\x) ^ 0 for 2 ^ x < 3. 

(iii) /"(x) = 6x —2 > 0 for all x such that 2 ^ x ^ 3. 

(iv) |/'(2)| = 7 and |/'(3)| = 20. 


Therefore |/'(2)| < |/'(3)|. 


Then 


/( 2 ) 


f\ 2)i 


= -<3-2. 


Hence convergence of the Newton-Raphson process is guaranteed for any 
starting value x 0 such that 2 ^ x 0 ^ 3. The results obtained with x 0 = 2*5 
are shown on page 113. 
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(b) We have f(x) = x —2 sinx. 

(i) f(jn) — j7t~2sinjn — jn — 2 < 0. 

/(2) = 2 —2sin2 > 0 since sin2 < 1. 

Therefore < 0. 


(ii) /'(x) = 1—2 cosx ^ 0 for y7r ^ x ^ 2 since cosx ^ 0 for x in this 
interval. 

(iii) f"{x) = 2 sin x > 0 for all x such that jn < x ^ 2. 

(iv) f'(jn) = 1 — 2cos^7r = 1 and /'(2) = 1 —2 cos 2 > 1 since cos 2 < 0. 


Then 


f(jx) 

f'fo) 


\jn-2\ 

1 


2-jn. 


Hence convergence of the Newton-Raphson process is guaranteed for any 
starting value x 0 such that jn < x 0 ^ 2. The results obtained with x 0 = 2 
are shown on page 115. 


7.7.3. Practical approach 

It is not always easy to find theoretically an interval surrounding a root 
and such that the Newton-Raphson process will converge to this root for 
all starting values in the interval. For this reason the following approach 
will often be found to be useful. We have already seen that the Newton- 
Raphson process for any distinct root is second order so that \F'(X)\ = 0 
and so certainly \F'(X)\ < 1. Hence, for all the functions F with which we 
will be concerned, there will be an interval /, surrounding 2, such that 
|F'(x)| < 1 for all x in I. Therefore, for any distinct root, the Newton- 
Raphson formula will give a convergent process when used with a starting 
value sufficiently close to A, that is, in the interval /. Hence the convergence 
of the process is seen to depend solely on the choice of starting value. (Note 
the difference between this situation and that which arose when con¬ 
sidering first-order methods. There a given formula might never give a 
convergent process for a particular root, no matter what starting value 
was used.) Therefore the only problem, using the Newton-Raphson 
process for a distinct root, is to obtain a starting value sufficiently close 
to the required root. This may be rather difficult. The procedure adopted 
is then 

1. Obtain an interval [a, b~] containing the required root. 

2. Choose a starting value x 0 in this interval. A suitable value might be 
obtained by applying the method of false position in the given interval; 
that is, use the value 

_ bf(a)-af(b) 

Xo m-m ' 
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ITERATION 


3. Start the iterations. Calculate xi, x 2 ,- 

4. If the process is not converging, begin again with a more accurate 
starting value x 0 . 


7.7.4. Order of the method 

The Newton-Raphson method for a root X of the equation f(x) = 0 is 

/(*) 


given by 1 = F(x n ) where F(x) = x 
Then F\x) = 


/'(*) 

mr\x) 


imr 

and so F'(X) = 0 (since f(X) — 0) unless f'(X) = 0. 

</'ix» 2 /■<*) + raimrm- 2 /(.<M/-(xi)' 
Then F (x) -TT^j?- 


and so 


F"(A) = 


f'W 

fW 


7 ^ 0 in general. 


Thus the Newton-Raphson method is generally a second-order method. 
Difficulties arise if f(A) = 0, that is, if A is a multiple root of the equation 
/(x) = 0. This case will be discussed later. 


7.7.5. Particular applications of the Newton-Raphson method 
We can find the square root of a positive constant a by using the 
Newton-Raphson method. The problem is to obtain the positive root of 
the equation x 2 — a = 0 or f(x) = 0 with f(x) = x 2 — a. 

Then the Newton-Raphson method in this case is given by 

x »~ a 

+1 ^ 



It can be shown, using the conditions of 7.7.2, that this method will con¬ 
verge to yja for any positive starting value x 0 . 

Now for the Newton-Raphson method we have 

„ ^ 1^2 17 "/ _ 1^2 f 

&n+ 1 — 2 e n^n\^) 2 * 
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But f(x) = x 2 — a and so 

1 2 { 2 \ e n 

Thus, since X > 0, all the e t for i = 1, 2, ... will be > 0. e 0 depends on 
the choice of starting value x 0 . Hence, no matter whether x 0 is chosen 
less than or greater than A, all the subsequent iterates will be greater than 
X , that is, all the iterates Xi, x 2 ,... will be overestimates of the root X. 

Example. Determine ^30 correct to 16 decimal places. 

We can start with any positive value for x 0 . However, since it is easily 
seen that ^30 lies somewhere between 5 and 6, we shall take x 0 = 5-5. 
Then \e 0 \ < 0*5. 

Now for the Newton-Raphson method we have 
1 =* jefF"(s l) = {el . 

In this case, f(x) = x 2 — 30 and so 


But here X > 5 and so < 0T^. 

Therefore an estimate of the error e\ in the first iterate x x is 

01(0-5) 2 ~ 0*03 (rounding to one significant figure). 

Hence we shall determine Xi to two decimal places. 

Then an estimate of the error e 2 in the second iterate x 2 is 

0*1(0*03) 2 ~ 0*0001 (rounding to one significant figure), 

and we shall calculate x 2 to four decimal places. 

Then an estimate of the error e 3 in the third iterate x 3 is 

0 * 1 ( 0 * 0001) 2 = 0*000 000 001 

and we shall calculate x 3 to nine decimal places. 

Then an estimate of the error e 4 in the fourth iterate x 4 is 

0 * 1 ( 0*000 000 001) 2 = 0*000 000 000 000 000 000 1 . 


E 
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ITERATION 


Hence, adopting the above procedure for xi, X 2 , x 3 , the fourth iterate x 4 
will certainly give ^30 correct to 16 decimal places. 

i( 3°' 

*1 = 2 *o+ — 


= 5*48 to two decimal places. 
30 ^ 

Xi 


*2 = 2 Xi + 


= 5*4772 to four decimal places. 



= 5*477 225 575 to nine decimal places. 


x 4 



= 5*477 225 575 255 166 1 to 16 decimal places. 


Hence to 16 decimal places ^30 is 5*477 225 575 255 166 1. 

Note that, if tables of square roots are available, then the value for 
obtained from them may be used as xo- This will probably result in fewer 
iterations being required to obtain a given accuracy. For example, from 
tables 730 - 5*477226 and so, taking x 0 equal to this value, only two 
iterations are required to give the value of yj 30 correct to 16 decimal 
places. (The reader should verify this for himself.) 

Similar processes can be derived to obtain the third, fifth, sixth roots, 
etc., of any given constant a. 

To determine the pth root of a we must solve x p = a or x p - a = 0. 
The Newton-Raphson method then gives 

x p n -a a + (p- l)xj 


Note that if p is even then a must be positive and x 0 must be chosen 
positive. If p is odd and a is negative, x 0 should be chosen negative. 


Example . Determine ^7 correct to four decimal places. 
The Newton-Raphson method is given by 

7 + 2x„ 
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With x 0 = 2 we obtain 


*i 


*2 


*3 


7 + 2(2) 3 
3(2) 2 

7 + 2(l*9) 3 
3(1*9) 2 

7-f 2(1*913) 3 
3(1*913) 2 


•9 (rounding to one decimal place). 


1*913 (rounding to three decimal places). 


= 1*9129 (rounding to four decimal places). 


Consideration of the error estimates, as in the last example, indicates that 
there is no need to continue the process any further. This could, of course, 
be confirmed by carrying out one further step. Hence ^7 is 1*9129 correct 
to four decimal places. 


7.7.6. Flow diagram 

The flow diagram for the Newton-Raphson process is the same as that for 
simple iteration (see §7.3.1). The function F will of course be different. 
For the Newton-Raphson process for the equation f(x) = 0, 


F(x) — x — 


/(*) 

/'(*)' 


EXERCISES 

1. Use the Newton-Raphson formula with starting value 0 to obtain, correct to three 
decimal places, the root of x 3 + x —\ = 0. 

2. Use the Newton-Raphson formula with starting value 1 to obtain, correct to two 
decimal places, the root of sinx + x — 1 =0. 

3. Use the Newton-Raphson formula to obtain, correct to four significant figures, 
the root of the equation sin x — x +\ = 0. 

4 . Find, correct to four significant figures, the only positive solution of the equation 
x 3 -f 2x —6 = 0 by elementary iteration and by the Newton-Raphson method. 

5. Show that the equation x = e~ x has a root in the interval [j, 1] and use the 
Newton-Raphson formula to determine this root correct to three decimal places. 
(The derivative of the function e~ x is — e~ x ) 

6. Use the Newton-Raphson formula to obtain 6 and ^217, each to six significant 
figures, and check your results against standard tables. 

7. Use the Newton-Raphson formula to devise an iteration procedure for the 
calculation of the reciprocal of a number without using the process of division. 
Hence evaluate yt to six significant figures and check your result with standard 
tables. 
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ITERATION 


8, Show that the equation x = 31nx has a root between 1*5 and 2 and another 
between 4*5 and 5. 

Evaluate these roots correct to three decimal places using (i) elementary 
iteration, (ii) the Newton-Raphson method. 

9. Show that the equation x 2 = sin nx has a root between x = f and x = f and that 

the iterative formula x n+1 = -sin -1 (x 2 ) guarantees convergence with any 

n 

starting value x 0 in [|, f] whereas the formula x n +i = (sin^x M ) 1/2 does not. 

Determine the root correct to three decimal places using (i) the above con¬ 
vergent process and Aitken’s <5 2 -process and (ii) the Newton-Raphson process. 

7.8. The solution of polynomial equations—the Birge-Vieta process 

In this section we develop a systematic method of applying the Newton- 
Raphson process to determine all the real roots of polynomial equations. 
Let p(x) denote the mth degree polynomial 

a m x m + a m -ix m ~ 1 +• * ' + CI 2 X 2 + a\X +do ( a m / 0 ). 

Using the Newton-Raphson process to determine a root of p(x) = 0 
we have 


pW 

%n +1 %n ,/ \ * 

p(x„) 

But p(x„) can of course be evaluated by the nesting method described in 
§ 3.1 by generating the sequence {b r }, r — m,m— 1,..., 1,0, where 

b m = a m , b m - k = x n b m - k+1 +a m - k for k = 1, 

Then p(x„) = b 0 (x n ). 

But, since p(x) may be expressed in the form 
p(x) = (x-x„)q(x) + b 0 

we see that p'(x) = q(x) + (x — x„)q'{x) 

and so p'(x„) = q(x„). 

However q is a polynomial of degree (n — 1) and so can be evaluated at 
x„ by generating the sequence {c r } where 

C m — bni . C m — k = X n Cm — A+ 1 H" b m — k for k i, 2, . . . , W — 1. 

Now in practice we will only be able to obtain approximate values for 
the coefficients of the polynomial q(x) unless X is known exactly. Thus we 
will only obtain a polynomial which approximates the reduced polynomial 
q(x). Hence, even if this polynomial could be solved exactly, we could 
only hope to obtain approximations to the remaining roots of the given 



THE SOLUTION OF POLYNOMIAL EQUATIONS 


123 


polynomial p(x). If the process is being repeated several times to obtain a 
number of the roots of the given polynomial, then errors of the above sort 
can build up. It is found that to obtain greatest accuracy the roots should 
be determined in increasing order of magnitude. 

Each step in the Birge-Vieta process may be conveniently recorded in 
a similar way to that used in the synthetic division process. 

For example, consider one step in the process to find a root of the 
equation 2x 3 + 7x 2 — 7x — 11 = 0 starting with the approximation x 0 = 1*5. 

Let p(x) = 2x 3 + 7x 2 — 7x —11. 


Then 


Xi 


P(*o) 
p'(x o) 


1-5- 


Pi 1-5) 
P\ 1-5)’ 


We evaluate p(l*5) by the synthetic division process 



2 

7 

-7 

-11 

1-5 


3 

15 

12 


2 

10 

8 

1 


Hence p( 1-5) = 1. 

Then we evaluate p'( 1-5) by carrying out the synthetic division process 
on the polynomial 2x 2 + lOx + 8. 



2 

10 

8 

1-5 


3 

19-5 


2 

13 

27-5 


Hence p\ 1-5) = 27-5. 

Therefore xi = 1-5 — ~ 1464. 

The middle lines of each of the above two tables can be omitted and the 
remainders joined to give one table. 

2 7-7 _ii 

1-5 2 10 8 1 

1-5 2 13 27-5 

For the general polynomial of mth degree a m x m + a m - 1 x m ~ 1 + • * • + 
aix + a 0 the step of the iterative process which obtains x n +1 from x n would 
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be recorded as shown: 


ITERATION 



a m 

@m— 1 

O-m — 2 

a 2 

ai 

do 


b m 

bm — 1 

b m -2 

b 2 

bi 

bo 

x rt 

Cm 

Cm — 1 

C m -2 

c 2 

Cl 


Then 



bo 

X „+1 = x n -. 





Cl 


The b r are calculated first using 

bm — a m , b m —k — x n b m —k+ 1 T Qm—k fbr k — 1,2,..., w, 

then the c r are calculated using 

c m = b m , c m —k = x n c m —k +1 + b m -k for k 1, 2,..., m 1. 

Example. Determine the two real roots of the polynomial x 4 —6x 3 — 
2x 2 + 58x — 50 correct to three decimal places, given that there is a root 
near each of 1 and —3. 


Since the roots should be calculated in increasing order of magnitude 
we shall attempt to obtain the root near x = l first. Thus take x 0 = 1. 
The calculation then proceeds as follows: 




1 

6 

-2 58 

-50 


1 

1 

5 

-7 51 

i 


1 

1 

4 

-11 40 


Hence 


Xi = 

+ 

= 0-975. 



1 

-6 

-2 

58 

-50 

0-975 

1 

-5-0250 

-6-8994 51-2731 

-0-0087 

0-975 

1 

-4-0500 

-10-8481 40-6962 



Hence 


x 2 = 0-975 + 


0-0087 

40-6962 


0-9752. 


Thus there is a root 0-975 correct to three decimal places. For the root 
near — 3 we consider the reduced polynomial 

x 3 - 5-0250x 2 - 6-8994x + 51-2731 


and take x 0 = —3. 
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The calculation then proceeds as follows: 


-3 

-3 


1 -5*0250 

1 -8*0250 

1 -11*0250 


-6*8994 51*2731 

17*1756 - 0*2537 

502506 


Hence x '“ + “- 2 '" 5 

Now we are very close to the required root, and so we shall carry out 
the next (and final) iteration using the original polynomial. This is 
advisable since the coefficients we have been using for the reduced poly¬ 
nomial are only approximate. 

We then obtain the following results: 

1 -6 -2 58 -50 

-2*995 1 -8*9950 24*9400 -16*6953 0*0024 

-2*995 1 -11*9900 60*8500 -198*9410 


Hence x 2 = -2*995+ = -2-995 correct to three decimal places. 

198*9410 

Thus there is a root — 2*995 correct to three decimal places. 

The remaining two roots of the given quartic are now approximately 
equal to the roots of the quadratic equation 

x 2 ~8*025x+17*176 = 0. 

The coefficients of this quadratic are, however, inexact and better 
approximations to them can be obtained by dividing the original poly¬ 
nomial by the factors (x — 0*975) and (x +2*995) as follows: 


1 -6 -2 58 -50 

0*975 1 -5*025 -6*899 51*273 -0*009 

-2*995 1 -8*020 17*121 


Thus we obtain the quadratic equation 

x 2 — 8*020x +17*121 =0 

which can now be solved to give the remaining two (complex in this case) 
roots of the original quartic. 
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ITERATION 


EXERCISES 

1. The equation x 3 - lOx + 4 = 0 has one root in each of the intervals [-4, —3], 
[0,1] and [2, 3]. Use the Birge-Vieta process to determine these roots correct to 
three significant figures. 

2. Show that the equation x 3 -5x 2 + 5x + 1 = 0 has one root between -1 and 0, 
another between 1 and 2, and the third between 3 and 4. Use the Birge-Vieta 
process to obtain all the roots correct to three decimal places. 

3. Obtain, correct to three decimal places, the only two real roots of the polynomial 

x 4 +2x 3 + 7x 2 — 11 = 0. 


7.9. Equations with nearly equal roots 

We now consider an equation of the form /(x) = 0 which has two nearly 
equal roots near x = A. In practice this will include the case of a double 
root at x = A. 


Then fW- 0 and /'(A) ^ 0. 


[For a double root at x = A we will have /(x) = (x—A) 2 q(x) and so 
/'(x) = 2(x-A)q(x) + (x —A)V(x). Hence /(A) = 0 and /'(A) = 0.] 

We see that for values of x close to the root A, /'(x) will become very 
small, and so the Newton-Raphson method will get into difficulty. This 
difficulty can be avoided by looking for a root of f'(x ) = 0 in a neighbour¬ 
hood of A. If /(x) = 0 has only two nearly equal roots near x = A, then 
/'(x) = 0 will have only a single root near x = A, and so the above difficulty 
of applying the Newton-Raphson method should not exist. From the 
approximation to the root of /'(x) = 0 we can then obtain approximations 
to each of the two corresponding roots of /(x) = 0 as follows. 

Let n be a root of /'(x) = 0 such that n+e (where e is small) is a root 
of /(x) = 0. 


Then fin + e) = 0. 

But, using Taylor’s series expansion,* we obtain 


f(fi+e) = f(fJ-)+sf(n)+ ie 2 /"(/i) + • • • 

= m+& 2 n»)+ 

since f'(ii) = 0. 

Hence, since s is small, 

0^/0x) + i£ 2 /» 

2m 


and so 


£ 2 ~ - 


f"(M )' 


See Appendix 1. 
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Now if fill) and f'(n) are of opposite sign, that is, if / is positive and has 
a maximum at x = ji (Figure 18a) or / is negative and has a minimum 
at x = ji (Figure 186) then 


2m 
n») 


>0. 


Hence a is real and the values (ju + e) and (/i — e) may be used as first 
approximations for the two roots of f(x) = 0. 



Figure 18 


If f{pi) and have the same sign, that is, if / is negative and has a 
maximum at x = n (Figure 19a) or / is positive and has a minimum at 
x = n (Figure 19b) then 


2 m 

f"(M) 


< 0 . 


Hence e is not a real number and the given equation f(x) = 0 does not 
have real roots near x = fi. 


H 



(a) 


(b) 



Figure 19 


Equations such as we are considering here are said to be ill-conditioned * 
since relatively small changes in the constants in the function / can 


* For a discussion of ill-conditioning in this context see Appendix 4, 
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ITERATION 


produce quite large changes in the roots. Conversely too, a relatively large 
change in x may have little effect on the value of the function /. 

For example, the quadratic equation x 2 — 1-498x4-0*561 = 0 has exact 
roots 0*75 and 0*748, while the quadratic equation x 2 - l*500x + 0*561 = 0 
has roots 0*789 and 0*711 correct to three decimal places. Also, evaluating 
the quadratic x 2 — l*498x + 0*561 at 0*749 gives —0*000 001. 

Example. Determine, correct to three decimal places, the two roots of 
the equation 30x 4 —45x 3 + 60x 2 —65x + 24 = 0 which are near 0*7. 

If /(x) = 30x 4 — 45x 3 + 60x 2 — 65x + 24 

we first attempt to find the (single) root of f'(x) = 0 near 0*7, that is, we 
look for a root of 120x 3 —135x 2 + 120x —65 = 0 near 0*7. Using the 
Birge-Vieta method we then proceed as follows. 

120 -135 120 -65 

0*7 120 -51 84*3 -5*99 

0*7 120 33 107*4 


5*99 

Therefore xi = 0*7 + —- 

~ 0*76 (rounding to two decimal places). 

Then, using this approximation to the root of f'(x ) — 0, we have, in the 
above notation, 

2 = 2/(0-76) 

£ /"(0-76) ’ 

Using the method of “nesting” to evaluate /(0-76) we obtain 


0*76 


30 -45 60 -65 24 

30 -22*2 43*128 -32*22272 -0*48927 


Hence to five decimal places /(0*76) is —0*489 27. 

Similarly, evaluating f"(x) = 360x 2 —270x +120 at 0*76, we obtain 


0*76 


360 -270 120 

360 3*6 122*736 


Hence 


/"(0-76) = 122*736 (exactly). 
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Therefore s 2 


2(048927) 

122-736“ 


~ 0 0080 (rounding to two significant figures). 

Hence e ~ ±0*09 (rounding to one significant figure). 

Therefore we should use 0*85 and 0*67 as starting approximations for 
determining the two roots of the given equation 

30x 4 - 45x 3 + 60x 2 - 65x + 24 = 0. 

We shall determine the smaller of the two roots first. 

Using the Birge-Vieta method the calculations then proceed as follows: 



30 

-45 

60 

-65 

24 

0-67 

30 

-24*9 

43*317 

-35*97761 

-0*10500 

0-67 

30 

-4*8 

40*101 

-9*10994 



_ , _ 0-10500 

Therefore x t = 0-67- - - ^ - 

~ 0-658 (rounding to three decimal places). 



30 

-45 

60 

-65 

24 

0*658 

30 

-25*26 

43*37892 

-36*45667 

0*01151 

0*658 

30 

-5*52 

39*74676 

-10*303 30 



0-01151 

Therefore x 2 = 0-658 

~ 0-6591 (rounding to four decimal places). 



30 

-45 

60 

-65 

24 

0*6591 

30 

-25*227 

43*372 88 

-36*412 93 

0*000 24 

0*6591 

30 

-5*454 

39*778 15 

-10*195 15 



0 000 24 

Therefore x 4 .0659i+ 5 ^^ 

^ 0-6591 (rounding to four decimal places). 
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Hence one root, correct to three decimal places, is 0*659. 

To determine the second root we continue the process with the reduced 
polynomial 30x 3 —25*227x 2 4- 43*372 88x —36*412 93 and x 0 = 0*85. 



30 

-25-227 

43*37288 

-36*41293 

0*85 

30 

0-273 

43*60493 

0*65126 

0*85 

30 

25-773 

65*51198 


Therefore x x = 

= 0-85 

0-65126 

65-51198 




~ 0*8401 (rounding to four decimal places). 
The next step is carried out using the original polynomial. 


30 -45 60 -65 

0*8401 30 -19*797 43*368 54 -28*56609 

0*8401 30 _ 5*406 47*91012 11*6832 0 

0*001 63 

Therefore *2 - 0-8401+ 

~ 0*8400 (rounding to four decimal places). 
Hence the second root, correct to three decimal places, is 0*840. 


24 

-0*00163 
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EXERCISES 

1. Find, correct to three decimal places, the positive roots of the equation 
x 3 — 12x +15 = 0. 

[It is easily seen that there is a root between x = 1 and x = 2 and another root 
between x = 2 and x = 3. Also, f'{x) becomes extremely small as x becomes close 
to 2. The problem should be treated as if there are two roots near x = 2.] 

2. Given that the equation 

x 4 — 1 -5x 3 + 4- 56x 2 — 6x 4- 2-25 = 0 

has two roots near x = 0-75, obtain these roots correct to three decimal places, 
assuming that the coefficients in the given equation are exact. 



The solution of 
systems of linear 
equations 



THE PROBLEM WE ARE CONCERNED WITH IN THIS CHAPTER IS THAT OF 
solving systems of linear equations such as the four by four system 

3*!— x 2 + 7x 3 + 4x 4 = 1 
2xx + 3x 2 + 5x 3 — 2x 4 = 3 
Xi + 2 x 2 — 3x 3 + 6x 4 = 5 
— 5xj + 4x 2 H~ x 3 — 2x 4 = 2. 


The general nxn system has the form 


flllXi + ^12A 2 + * * * + Clln^n ~ b\ 

&2 1^1 + <? 22*2 + ' * ' + ci2n x n = ^2 

a n 1X i T dn2^2 + * ’ ’ + dnn^n bn 

that is, n linear equations in n unknowns. We shall only consider systems 
for which a unique solution exists. In matrix notation then we have 
Ax = b where 



is a non-singular nxn matrix,* 



* A non-singular matrix is one whose determinant is not zero. 
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is a given n-dimensional vector and 



is an n-dimensional vector whose components it is desired to find. 

Problems of this type occur naturally in many branches of applied 
mathematics. Also, when solving more complicated problems, a first step 
is often to reduce the problem to one of the above type. 

There are two main classes of methods for solving linear systems of 
equations numerically, namely direct and iterative . Iterative methods are 
essentially an extension of the methods of the last chapter but, whereas 
there we carried out the iterative processes for a scalar, it is now a vector 
which is required. These methods, which are normally applied only when 
the matrix of the system is sparse (that is has many zero elements), are 
beyond the scope of this book. The remainder of this chapter will be 
concerned with direct methods, that is, methods which go directly to the 
solution in “one step”. This is in contrast to iterative methods which 
repeat a process over and over again until convergence (to within a 
specified tolerance) is obtained. Direct methods have a fixed predictable 
number of operations (that is, additions, subtractions, multiplications and 
divisions) but are affected by inaccuracies due to round-off errors. 

8.1. Gaussian elimination 

The method of Gaussian elimination which we shall describe first reduces 
any given system to a “triangular” system of equations which is solved 
by the method of back substitution. Firstly we illustrate back substitution 
by solving the following system of equations: 


3*1 — 2x 2 — 4x 3 = 3 
4x 2 + 3 x 3 = 2 
2x 3 = 4. 

From the last equation we obtain x 3 = 2, and then from the second we 
obtain 4x 2 = 2 —3x 3 = —4, and so x 2 = — 1. Finally from the first 
equation we obtain 3xi = 3 + 2x 2 + 4x 3 = 9 and so xi = 3. We now 
illustrate the complete process of Gaussian elimination in the following 
example. 
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Example. Solve the system 

4xi + 2 a 2 + 3%3 — 6 
5xi — 5 x 2 + 3x 3 = 3 
3xi — x 2 —2x 3 — 2. 

If we leave the first equation unaltered, then subtract £ times this equation 
from the second to obtain a new second equation, and £ times the first 
from the third to obtain a new third equation, we obtain the system 

4x 1 +2x 2 + 3x 3 =6 

— 7*5x 2 —0*75x 3 = “4*5 

— 2*5x 2 — 4*25 x 3 = —2*5 

Now if we leave the first two equations in this system unaltered and then 
subtract (^f) times the second equation from the third equation to obtain 
a new third equation, we obtain the system 

4x t + 2x 2 + 3 x 3 =6 

— 7*5x 2 —0*75x 3 — —4*5 

-4x 3 = -1 

This is a “triangular” system and so is easily solved, as above, to give 
x 3 = 0*25 

x 2 = -^(-4*5 + 0*75x3) = 0*575 
and xi = i(6 —2x 2 —3x 3 ) = 1*025. 

This illustrates the process of Gaussian elimination. The working can be 
conveniently and economically recorded in the following tabular form: 


right-hand 

left-hand coefficients coefficient 

-^ I. 


'an 

a i2 

Cli 3' 

bi 

4 

2 

3 

6 

5 

-5 

3 

3 

3 

-1 

-2 

2 


-7-5 

-0-75 

-4-5 


-2-5 

-4-25 

-2-5 



-4 

-1 


Roots 


1-025 


0*575 


0-25 
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Note that at each stage we do not write down again the coefficients of 
any equation which has remained unaltered from the previous stage. 
Now consider the general 3x3 system: 


^11*1 + ai 2 X 2 + Cl\ 3 X 3 = &1 
# 21^1 + # 22^2 + # 23-*3 = b 2 


^31*1+#32^:2+a 33 X 3 — b 3 


in which an ^ 0. (Note that since the equations have a unique 
solution we can always arrange that an # 0 by rearranging the given 
equations if necessary.) Eliminate xi from the second equation by 
subtracting m 2 i = a 2 i/fln times the first equation from the second 
equation. Similarly, eliminate xi from the third equation by subtracting 
m 3i — fl 3 i/fln times the first equation from it. Thus we obtain the system 


anXi +ai 2 X2 + ai3X3 — b 1 

a 22 X 2 + a 23 X 3 = b 2 l) 
a^lxi+a^Xi = b\P 


in which 

= a 22 -m 2 iai 2 , a^l = a 23 -m 2 iai 3 , b^ = b 2 -m 2 ib 1 , 

= a 32 —m 3 iai 2 , aQ = a 33 -m 3 iai 3 , b^ 1] = b 3 -m 3l bi. 

Now eliminate x 2 from the third equation in the above system by 
subtracting m 32 = times the second equation from it. (Note that 

if = 0 then it will be necessary to interchange the second and third 
equations before carrying out this step.) Thus we obtain the following 
triangular system, 


a\ 1 X 1 + a\ 2 x 2 + a\ 3 x 3 — b\ 


tf^x 2 + a^x 3 = b$ ] 

dflx 3 = b^ 2) 


in which 


afii = a^-m 32 a^ and b <* 2) = b< 3 ) -m 32 b i £\ 
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Hence x 3 = and then, from the second equation of the above 

system, we can obtain x 2 • From the first equation of the above system 
we can obtain x x . This latter process is the process of back substitution 
and the complete algorithm is known as Gaussian elimination. The 
equations of the final (triangular system are called pivotal equations and 
the coefficients an, a^, a$ are called the pivots for the process. 

In tabular form we have 


right-hand 

left-hand coefficients coefficient 


m ij 

an 

an 

> 

an 

bi 

1 

an 

an 

#13 

b x 

a 2 i/«n 

«21 

a 22 

a 2 3 

bz 

a 3 i/an 

#31 

#32 

#3 3 

b 3 

i 




w 





w 

1 




b?> 

Roots from 
back substitution 

Xi 

x 2 

x 3 = w/ctii 



Note that we have put mn = m 2 i = m 33 = 1. 

The process which we have described above for a three by three system 
is, of course, easily generalized to the case of higher-order systems. For 
the nx n system 


anXi +ai2X2 + ’ * *4"^i«x n — b\ 

a 2 i xi + a 22 x 2 H“ * * * + a 2n x n = b 2 


0 ni*i + a n2 x 2 + '' • + a nn x n — b n 


we proceed as follows: 

Eliminate xi from the second equation by subtracting m 2 1 = a 21 /an 
times the first equation from it. (The given equations should first be 
rearranged, if necessary, to ensure that an ^ 0.) Similarly eliminate xi 
from each of the following equations by subtracting mn — an!an times 
the first equation from the ith equation for i = 2, 3, ..., n. Thus we 
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obtain the system 


011*1 +012*2 + * ’ ‘ + 01n*/i ~ b i 

a&x2 + * • * + a^x„ = b$ ] 
0&*2 + '" + 0 &*k = b ^ 


ai 1 2 ) x 2 + - '• + a i n 1 n ) x n = b { n ] 

where a\j ] = a u — m n a Xj and b\ 1} = b t — m n b, for i = 2, 3, n and 
j = 2, 3,..., n. 

Now eliminate x 2 from each of the 3rd, 4th, ..., rcth equations of this 
last system by subtracting m i2 = aty/aftl times the second equation from 
the ith equation for i = 3, 4, ..., n. (The equations must again be re¬ 
arranged, if necessary, to ensure that a^l # 0 before carrying out this step.) 
Thus we obtain the system 


011*1+012*2 + ' ' ’ + 01 „X n = b i 
a l ih2 + '' • + a < iJx„ = b^ ] 
a&Xi + ■ • • + a$x n = M, 2) 
a^x 3 + • • • + <4 2 „bc„ = b2 ) 


a ( „Vx 3 + --- + ati ) x n = b ( n 2) 

where a{ 2) = a}]'—m i2 aSV, b\ 2) = b\ u —m^b^ for i = 3, 4, n and 
j = 3, 4, ..., n. 

This process is repeated until the following triangular system is 
obtained. 


011*1 +012*2 + ' ' * + 01n*« = b i 

a'i'hi + -b a^Xn = b ( 2 ] 

a' 3 lx 3 + • • • + a ( inX n = M> 2) 


a (n ~ X) x =b {n ~ 1] 

Then x„ = and hence x w -i, x„- 2 ,... , x 2 , *i are obtained in 

that order by back substitution. The equations of this last system are the 
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pivotal equations, and the coefficients are the pivots. The process can be 
summarized in tabular form as for the 3x3 case as follows: 


right-hand 

Multipliers left-hand coefficients coefficient 



Example. Solve the following system of equations working to three 
decimal places throughout. 

2-801xi-1-643 x 2 + 2-017x3 - 3*216 
0*762xi + l*372x 2 —2*631x3 = 1-835 
1-314x! -4*721x2 + 3*596x3 = 1*062. 

right-hand 


Multipliers 


left-hand coefficients 


coefficient 

rtiij 

r a t i 

Cli2 

tfi3 

bt 

1 

2-801 

-1*643 

2-017 

3-216 

0*272 

0-762 

1-372 

2-631 

1-835 

0-469 

1-314 

-4*721 

3-596 

1-062 

1 


1-819 

■3-180 

0-960 

-2-172 


-3-950 

2-650 

-0-446 

1 


- 

■4-257 

1-639 

Roots from 
back substitution 

1-340 

-0145 

■0-385 
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EXERCISES 

1. Use Gaussian elimination to solve approximately the following system of linear 
equations 

2xi— x 2 — 4x 3 = 4 
4xi+x 2 + 2x 3 = 4 
3xi +8x 2 — x 3 = 20 . 

Use three decimal places in the calculations. 

2. Use Gaussian elimination to obtain an approximate solution of the following 
system of linear equations 

3xi — 12x 2 + 8x 3 = 7 
4xi + 7 x 2 —2x 3 = 4 
7xi + 9 x 2 + 5x 3 — 11. 

Use three decimal places in the calculations. 

3. Use Gaussian elimination to obtain an approximate solution of the following 
system of equations 

7xi +4x 2 + 17x 3 = —85 
33xi + 16x 2 + 72x 3 = 359 
24xi + 10x 2 + 57x 3 = 281. 

Use three significant figures in the calculations. 

4. Use Gaussian elimination to obtain an approximate solution of the following 
system of equations 

0163xi + l*56x 2 — 8*10x 3 = 2-49 
-0-00154X!+0*796 x 2 + 4-69x 3 = 0*652 
6*40x 1 -0*378x 2 + 0*573x 3 - 0*0328. 

Use three significant figures in the calculations. 

5. Construct a flow diagram for the back-substitution process. 

6. Construct a flow diagram for the elimination process. 


8.1.1. Checking procedures 

In this section we shall describe two checking procedures which are useful 
when the calculations are being performed manually or using ? desk 
calculator. These checking procedures are superfluous if a computer is 
being used, since, once a program has been developed and thoroughly 
tested, it should run without error (apart from perhaps a machine failure!) 
for different systems of equations. 

The first check results in an extra column in the above tabular 
representation. We shall consider again the nx n system given in the 
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previous section and change the notation a little by putting b t = a in+1 
for i = 1, 2,..., n. The system of equations can now be written 

a ll X l + a i2 X 2 + " ' + a in X n = a in+l 
#21*1 +^22*2 + * ’ ’ + 02= 02«+l 


0nl*l + 0«2*2 + * * * + 0rm*« 0««+ 1 • 

n+ 1 

Now let 0 i «+2 = Z a iJ f° r 2, 

7=i 

that is, aj n+ 2 is the sum of all the coefficients in the ith equation or the 
sum of all the elements in the ith row of the table (excluding, of course, 
rriij). The elements a in + 2 for i = 1,2,..., n are now written as an extra 
column on the right of the table and are then operated on to obtain 
0 ^ + 2 , a$+ 2 , etc., in exactly the same way as the elements a tj for j = 2, 3, 

..., rc +1 are operated on to obtain a\)\ a\j\ etc., that is, 

<4 i +2 = a in+ 2 -m n a in+2 , a}l\ 2 = afj, ) +2 -"»i 2«^+2 
etc., for appropriate values of i. 

Therefore a[ 1 „ ) +2 = a in+2 -m n a ln+2 

h +1 n+ 1 

= Y Y &lj 

j= 1 7=1 

«+l «+l 

= Z Uhj-miaij)= Y “W- 

7=1 7=1 

Similarly 0 ^+ 2 = Z ^+ 2 = Z a uV etc. 

7=1 7=1 

Thus, at each stage of the process, the elements in the (n + 2)th column 
of coefficients should be equal, to within round-off errors, to the sum of 
the elements in the corresponding row (excluding of course the column of 
rriij). If at any stage, due to round-off errors, the elements in the (n + 2)th 
column obtained by the elimination algorithm is not exactly equal to the 
appropriate row sum, then it should be replaced by this (row sum) value 
for subsequent calculations. In this way the effectiveness of the check is 
not reduced by the accumulation of round-off errors. Thus this procedure 
checks each stage of the process separately. It provides a check on the 
elimination process. After we have described the other check we shall 
re-do the example on page 134 incorporating them both. 
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The second check, which we shall now describe, results in an extra row 
in the tabular representation. 

n 

Let Sj = L % for J = L2,+2, 

i = 1 

that is, the elements Sj are column sums of coefficients. These elements 
are written as an extra row along the bottom of the table. Now consider 
the scalar product of the vector (si, s 2 , ..., s n ) with the solution vector 
(xi, x 2 ,..., x n ) obtained after back substitution. 

This scalar product is equal to 

s 1 x 1 +5 2 x 2 + --- + 5 n x„ = i S jXj = i ( i aXj 
J= 1 J= 1 V=i / 

n / n \ n 

= X! ( &ij*j ) = X! ttin+1 

i = 1 \j=l / i=l 


— Sn + 1 • 

Thus, to within round-off error, the scalar product should be equal to 
5 W+1 . Now the above result can also be written 

+ + = X bi. 

Hence we see that this check merely states that the solution obtained 
should satisfy, to within round-off error, the equation got by adding 
together all the equations of the given system. This check is a check on 
therback substitution. Since s n+2 has also been calculated we can perform 
the following additional check. 

n +1 n +1 n n n +1 n 

Z s j = L Z«o = Z Z a n = Z a ‘»+2 = s„+ 2 - 

j= 1 1 1=1 i= 1 j= 1 i= 1 

Thus the sum of the first (n -f 1) of the Sj should be equal to s„ +2 . This check 
should be exact and not subject to round-off error since s n+2 is just the 

sum of all the coefficients in the given system of equations obtained by 

«+1 

summing the row sums, whereas £ Sj is the same quantity obtained by 

j -1 

summing all the column sums. 

We shall now re-do the example on page 134 incorporating the above 
checks. 
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right-hand Check 


Multipliers 

m ij 

' a n 

left-hand coefficients 

fl* 2 


coefficient 

#i4 

column 

&i5 

1 

4 

2 

3 

6 

15 

1 

4 


-5 

3 

3 

6 

3 

4 

3 

-1 

-2 

2 

2 

1 


-7-5 

-0-75 

-4-5 

-12-75 

1 

y 


-2-5 

-4-25 

-2-5 

-9-25 

1 



-4 

-1 

-5 

Roots 

1-025 

0-575 

0-25 



S J 

12 

-4 

4 

11 

23 


Note that the numbers 15,6 and 2 in the last (check) column are obtained 
by summing the coefficients in the corresponding rows of the table 
(excluding the m/j). The number —12-75 is obtained by evaluating 6 —£(15) 
and is then checked against the corresponding row sum (— 7-5 — 0-75 — 4-5). 
The number — 9-25 is obtained by evaluating 2—£(15) and is then checked 
against the row sum ( — 2-5 —4-25 —2-5). Similarly the number —5 is 
obtained by evaluating —9-25 — ^(—12-75) and is then checked against 
the row sum ( — 4—1). The numbers in the last (check) row are obtained 
according to their definition, that is, the 12 is obtained by summing 4, 5 
and 3, the —4 is obtained by summing 2,-5 and — 1, etc. 

Then Si*i+ s 2 - x 2 + 5 3 : * : 3 — 12(1-025) +(—4) (0*575)+ 4(0-25) 

- 11 
= s 4 . 

Finally Si+S 2 + S 3 +S 4 = 12 — 4 + 4+11 = 23 = ss. 

Note that since all the arithmetic required in the above example was 
carried out exactly (and so without introducing any round-off errors) all 
the checks have been satisfied exactly. 

Example . Solve the following system of equations working to three 
decimal places throughout and incorporating the checking procedures. 

2-801*!- 1-643x2 + 2-017x3 = 3-216 
0-762*! + 1-372x2-2-631x3 = 1-835 
1-314*1-4-721*2+ 3-596*3 = 1*062. 
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, 

right-hand 

Check 

Multipliers 

1 left-hand coefficients 

coefficient 

column 

m ij 

' a i\ 

a i2 

% ' 

a iA 

a i5 

1 

2-801 

-1-643 

2-017 

3-216 

6-391 

0-272 

0*762 

1-372 

—2-631 

1-835 

1-338 

0-469 

1-314 

-4-721 

3-596 

1-062 

1-251 

1 


1-819 

-3-180 

0-960 

=-0-400 -0-401 

-2-172 


-3-950 

2-650 

-0-446 

-1-746 




-4-257 

1-639 

=2+17 -2-618 

Roots 

1-340 

-0-145 

-0-385 



*J 

4-877 

-4-992 

2-982 

6-113 

8-980 


s 1 x 1 +s 2 x 2 + s 3 x 3 = (4-877)(l-340) — 4*992( — 0* 145) + 2-982( — 0*385) 
= 6-111 to three decimal places 
- 54. 

Si+s 2 + s 3 + s 4 = 4-877-4-992+ 2-982+ 6*113 = 8-980 = s 5 . 


EXERCISES 

Apply the checking procedures to the solutions of the exercises given at the end of 
the last sub-section. 


8.1.2. Multiple right-hand sides 

If it is required to solve a system of equations Ax = b for a given matrix 
A and a series of different right-hand-side vectors b , then a considerable 
amount of work can be avoided by including all the columns of right- 
hand sides in the one table and proceeding as before. If the system is 
n x n we first obtain a value of x n corresponding to each right-hand-side 
vector and then back substitution has to be performed for each right-hand 
side to give all the values of x„_i, x„_ 2 ,..., x 2 , xi. 

Now if there are m right-hand-side vectors 




to be considered, the ith element in the check column is £ a ij9 that is, 
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the sum of all the elements in the ith row. In the check row, the scalar 
product of the vector (si, s 2 , ■ ■ ■, s„) with the solution vector corresponding 
to the fcth right-hand-side vector must be equal to the sum of the elements 
of the fcth right-hand-side vector, that is, to s n+k , for each k. 

The table for the general 3x3 system with three different right-hand- 
side vectors is shown below. 


Multipliers 

left-hand coefficients 

_A_ 

right-hand coefficients 

_A_ 

Check column 

Mij 

0/1 

0i 2 

0i 3 ' 

' 0f4 

0i5 

a t 6 ' 

a il 

1 

011 

012 

013 

014 

015 

016 

6 

017 = Yj a U 
j= 1 

021 

W 2 1 =- 

011 

021 

022 

023 

024 

025 

026 

6 

027 = X 

J=1 

031 

m 3 1 = — 
011 

031 

032 

033 

034 

035 

036 

6 

037 = X 

J=1 

1 


dii 

m 

4U 

dii 

dii 

dii 




48 

dii 

m 

dii 

dii 

1 

dii 

dii 

dii 

dii 

dii 

Roots 

xi‘> 


xi> = 

d gl 
'd& 





xi 2 » 

X<2 2 > 

xi> = 

dii 

dii 





x?> 

x<i> 

x<> 3 > = 

dii 

dii 




Sj 

Si 

S2 

S 3 

S 4 

S 5 

S 6 

S 7 


To within round-off errors we should have 

SlM 1) + S 2 xS 1) + S3Xf| 1) = S 4 
Si4 2, + S2X<1 2) -|-S3xSs 2) = S 5 
SiX^ 3) + S 2 x!j 3 ) -|-S 3 ^ 3) = S 6 . 

Also Si +S 2 +S 3 +S 4 + S 5 -I-S 6 should be exactly equal to s 7 . 
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Example. Solve the set of simultaneous equations Ax = b k in which 



for the three right-hand-side vectors 



(Retain three decimal places throughout the calculations.) 


Multipliers 

m ij 

left-hand coefficients 

_ 

right-hand coefficients 

Check 

column 

a il 

' a n 

a i 2 

*13 

r a t 4 

a iS 

m 

1 

5 

1 

-3 

1 

0 

0 

4 

0*2 

1 

2 

4 

0 

1 

0 

8 

0-6 

3 

-1 

2 

0 

0 

1 

5 

1 


1-8 

4-6 

-0-2 

1 

0 

7-2 

-0-889 


-1-6 

3-8 

-0*6 

0 

1 

2-6 

1 

7-889 

-0-778 

0-889 

1 

'900L. 9000 

Roots from 

0-112 

0-142 

-0099 





back 

0-014 

0-267 

0-113 





substitution 

0-141 

-0-325 

0-127 





s i 

9 

2 

3 

1 

1 

1 

17 


For the checks we obtain 

9(0-112)+ 2(0-142)+ 3(- 0-099) = 0-995 - 1 = s 4 , 

9(0-014) + 2(0*267) H- 3(0*113) = 0-999 - 1 = s 5 , 

9(0-141) + 2( — 0*325) + 3(0-127) = 1-000 = s 6 

and S 1 +S 2 + S 3 +S 4 + S 5 + S 6 = 9 + 2 H- 3 +1 + 1 + 1 = 17 = S 7 . 

8.2. Determination of the inverse A~ 1 of a matrix A 

Consider the nxn system Ax = b k for the series of right-hand-side vectors 
b k (fc = 1, 2, ..., n) which are the successive columns of the nxn unit 
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matrix, that is, bi, b 2 , ..., b n are respectively the ^-dimensional vectors 



Let the respective solution vectors be denoted by x k (k = 1,2,..., n). Then 



and so jq = A 1 b v x 2 =A 1 b 2 ,...,x n — A 1 b n . 

Now form the nxn matrix X whose columns are the vectors jc 1? jc 2 , ..., 
x n , that is 

X — [xj, ♦ • •, 


= [A 1 6i, A ^ 2 ,..., A 'AJ 



where / is the n x n unit matrix. 

Hence X = yl -1 and so the separate solution vectors of Ax = b k are the 
columns of the matrix A~ l . 

Example . Keeping three decimal places throughout the calculations 
determine an approximation to the inverse of the matrix 



Using Gaussian elimination, we must solve the system of equations 
Ax = b k (k = 1,2,3) for the three right-hand-side vectors 
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This is precisely the problem which was solved on page 145. From the 
results obtained there we see that an approximation to the inverse of the 
given matrix is the matrix 


0112 

0014 

0*141\ 

0142 

0-267 

-0-325 J 

-0099 

0-113 

0-127/ 


EXERCISES 

1. Use Gaussian elimination to obtain approximations to the inverse of the matrices 
of the systems in exercises 1,2, 3, 4 at the end of §8.1. Hence obtain approximate 
solutions of the given systems and compare these with the solutions already 
obtained. 

8.3. Pivoting 

If all our calculations could be carried out exactly, that is, without 
introducing any round-off errors, then there would be no more to say 
about Gaussian elimination. However, in general, it will be neither possible 
nor practical to use exact arithmetic in our computations, and so great 
care must be taken to ensure that the accumulation of round-off errors 
does not become too great. 

To illustrate the problems involved we shall consider the solution using 
Gaussian elimination of the rather trivial two-by-two system. 

10 == 0*6 
Xi + Xz — 1. 

Of course, this system could be solved easily without Gaussian elimination 
but we shall use Gaussian elimination as the difficulties which then arise 
are the same as those which arise when solving higher-order systems. We 
shall use four significant figures in the calculations. 

W 2 1 = YqU = 10 5 = 1-000 x 10 5 


and so we obtain 

(1-000-1-000 x 10 5 )x 2 = 1-000 - 6-000 x 10 4 . 
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Then, using four significant figures, we have 

-1-000 x 10 5 x 2 = - 6-000 x 10 4 
and so x 2 = 6-000 x 10 “ 1 . 

The back substitution then gives 

X! = 10 5 (0-6 —x 2 ) = 0. 


Now the true solution of the given system correct to five decimal places 
can easily be obtained and is x A — 0-400 00, x 2 — 0-600 00. Thus the result 
we have obtained above for xi is far from satisfactory. 

Now let us solve the same system again, still using Gaussian elimination, 
but changing the order of the two given equations. Thus we start with 
the system 

Xi + x 2 = 1 
10 -5 xi -hx 2 — 0-6. 

This time m 2 \ = 10" 5 and so we obtain 

(1-000-1-000 x 10“ ■ 5 )x 2 - 0-6-1-000 x 10“ 5 . 

Hence, to four significant figures, 

l-000x 2 = 0-6000 

and so x 2 = 0-6000. 

Now the back substitution gives 

Xi = (1 x 2 ) 

= (1-0-6000) 

= 0-400. 

This result is much more satisfactory! We must now attempt to find out 
what went wrong with our first Gaussian elimination calculations so that 
we can, if possible, avoid falling into the same trap again. 

In the first solution, because of the relative smallness of the pivot, to 
obtain the coefficients of the modified second equation we had to add to 
each of the old coefficients of x 2 and the constant term, quantities so large 
numerically that, using only four significant figures, the old coefficients 
themselves were completely negligible; that is, the information contained 
in the second equation was completely lost. The back substitution in the 
first equation then resulted in Xi being evaluated as the difference of two 
“very nearly equal” numbers with a resulting loss of significant figures. In 
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fact in this example xi was calculated as the difference of two numbers 
which were equal to four significant figures, and so no significant figures 
were obtained in the result. 

In the second solution the pivot was large, and so the coefficients of the 
modified second equation were obtained by adding to the respective old 
coefficients quantities which were relatively small. Thus the information 
in the (then) second equation was not lost. The (then) first equation was, 
of course, used for the back substitution, so that the information contained 
in both of the given equations was utilized. 

8.3.1. Partial pivoting 

The most commonly used method for attempting to avoid the above 
difficulty is known as partial pivoting. It consists of choosing as pivot at 
each stage of the Gaussian elimination process the numerically largest 
coefficient in the left-hand column. This is equivalent to re-ordering, if 
necessary, the rows in each derived system; that is, to writing down the 
reduced equations in a different order. 

As an illustration of the effect of partial pivoting, consider the solution 
of the following system. 

07x1+46x2 — 16x3 = 13*3 
19 xi — 13x 2 + 23x3 = 14 
31xi+26x2 + 11x3 = 17. 


(The exact solution of this system is xj = -1, x 2 = 1, x 3 = 2.) 

Without using partial pivoting, and retaining three significant figures 
throughout the computations, we obtain the following: 


rriij 

Oi 1 

2 

3 

ai 4 

tfiS 

1 

0-7 

46 

-16 

13-3 

44 

27-1 

19 

-13 

23 

14 

43 

44-3 

31 

26 

11 

17 

85 

1 


-126 x 10 

457 

-346 

-115x10 

1*60 


-201x10 

720 

-572 

-186x10 

1 



—11*2 

-184 

-26-0- -29-6 

Roots 

-0620 

0869 

1-64 



S] 

507 

59 

18 

44-3 

172 


The relatively large discrepancy in the element in the check column at 
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the last stage of the elimination process indicates a dangerously large 
accumulation of round-off errors. Note too that the coefficients in the last 
row of the elimination process were obtained as differences of pairs of 
considerably larger numbers. 

Now S 1 X 1 +S 2 X 2 + S 3 X 3 =494 

which differs appreciably from the value 44*3 obtained for s 4 . This is a 
further indication that the solution we have obtained is rather inaccurate. 

Repeating this example using partial pivoting we obtain the results 
shown below. The pivot element at each stage of the process is printed in 
bold type instead of changing the order of the equations. 


m ij 

a n 

a i2 

a i3 

a i4 

a i5 

0-0226 

0-7 

46 

-16 

13-3 

44 

0-613 

19 

-13 

23 

14 

43 

1 

31 

26 

11 

17 

85 

1 


454 

-16-2 

12-9 

42-1 

-0-637 


-28-9 

16 3 

3-58 

-JMXT -9-02 

1 



5-98 

11-8 

17-8 

Roots 

-0-978 

0-987 

1-97 




50-7 

59 

18 

44-3 

172 

Here 

S 1 X 1 +S 2 X 2 + S 3 X3 

= 44.1 ~ 44-3 = s 4 . 



In this case, partial pivoting has resulted in a more accurate solution 
being obtained. It must, however, be pointed out that while partial 
pivoting will generally improve the accuracy of a solution (when working 
to a restricted number of significant figures) it cannot be guaranteed to 
do so in every case. 

EXERCISE 

Construct a flow diagram for Gaussian elimination with partial pivoting. 

8.3.2. Complete pivoting 

The success of partial pivoting suggests that perhaps we can do even better 
if we choose as pivot not merely the numerically greatest coefficient in the 
first column, but the numerically greatest coefficient in the whole matrix 
of coefficients at each stage. This is equivalent to reordering, if necessary, 
the rows and the columns in each derived system. The method using this 
technique is called complete pivoting. It is more awkward to record when 
the computations are being carried out manually, is more difficult to 
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program for a computer, and is more time-consuming than partial 
pivoting on a computer, since a two-dimensional scan of coefficients has 
to be made at each stage to determine the numerically largest instead of 
a one-dimensional scan. Further, it does not necessarily lead to more 
accurate results, and may indeed give poorer ones in many cases. This 
latter situation arises because the product of the (exact unrounded) pivots 
is equal to a fixed number (namely the determinant of the matrix of the 
given system). Hence the use of too large pivots to begin with may result 
in the final pivot being extremely small. 

Example. Solve the following system of equations using (i) partial 
pivoting and (ii) complete pivoting. 

x x — 19x 2 + 6x 3 + 72x 4 = 21 
16xi + 56x 2 + 21x 3 — 7x 4 = 47 
2 Xi + 7x 2 + 4x 3 + 3X4 = 7 
Xi+ 5 x 2 + 3x 3 + x 4 = 4. 

(The exact solution of these equations is x x = — 1, x 2 =2, x 3 = — 2, 
x 4 = 1.) (In both cases we shall retain three significant figures throughout 
the calculations although, because of round-off errors and cancellations, 
this will not always be three-figure accuracy.) 

The pivot elements are in bold type in both cases. 


(i) 


m ij 

a il 

« i2 


a i4 

a i5 

a i6 

00625 

1 

-19 

6 

72 

21 

81 

1 

16 

56 

21 

-7 

47 

133 

0*125 

2 

7 

4 

3 

7 

23 

00625 

1 

5 

3 

1 

4 

14 

1 


-22-5 

4*69 

72*4 

18 1 

72*7 

0 


0 

1*38 

3*88 

1 12 

6*38 

-00667 


1*5 

1*69 

1*44 

106 

5*69 

0*690 



1*38 

3*88 

1*12 

6*38 

1 



200 

6*27 

2*27 

10*5 

1 




-0*446 

-0*446 

-0*892 

Roots 

-100 

2*00 

-2*00 

1*00 



s ; 

20 

49 

34 

69 

79 

251 


Si -Xi + s 2 x 2 + s 3 x 3 + S 4 X 4 — 80 
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which does not differ too much from the value 79 obtained for s 5 . The 
product of the pivots ~ 321. 


(ii) 


m ij 

a n 

a i2 

a i3 

a i4 

«!5 

a i6 

1 

1 

-19 

6 

72 

21 

81 

-0-0972 

16 

56 

21 

-7 

47 

133 

00417 

2 

7 

4 

3 

7 

23 

00139 

1 

5 

3 

1 

4 

14 

1 

16-1 

54*2 

21-6 


490 

141 

0*144 

1*96 

7-79 

3-75 


6-12 

19 6 

00970 

0-986 

5-26 

2-92 


3-71 

12 9 

0-776 

-0-358 


0-640 


-0-936 

-jym -o*654 

1 

-0-576 


0*825 


-104 

-JOT7T-0*791 

1 

0*0890 




-0-129 

-J00402 -0*0400 

Roots 

-1*45 

2-25 

-2-28 

M0 



S J 

20 

49 

34 

69 

79 

251 


s x xi +S2X2 + S3X3 +S4X4 — 80 

which once again does not differ too much from the value of 79 obtained 
for s 5 . Note, however, the dangerous accumulation of round-off error 
which is suggested by the later values in the a i6 column. 

The product of the pivots ~ 287. 

On comparison with the exact solution of the given system it is clear 
that in this case the approximate solution using partial pivoting is better 
(for each of the values xi, x 2 , x 3 and x 4 ) than the approximate solution 
obtained using complete pivoting. As stated above, this is essentially 
because using complete pivoting resulted in the first two pivots being so 
large that the later ones were excessively small. 

Note that it would appear that the products of the pivots in (i) and (ii) 
are not nearly equal. The discrepancy between the two values is, however, 
also due to the build-up of round-off errors, mainly in the solution using 
complete pivoting. (The determinant of the matrix of the given system is 
312.) 

EXERCISES 

1. The system 

0000100X! + 1*00X2= 100 
l*00xi + l-00x2 = 2*00 
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has the solution 


*i = 1000 10, x 2 = 0-999 90 

rounded to the number of decimal places shown. Solve this system by Gaussian 
elimination, rounding to two significant figures, using (i) no pivoting, (ii) partial 
pivoting. Compare the results with the more accurate solution given above. 

2. Use Gaussian elimination with partial pivoting to obtain approximate solutions 
for the systems of equations given in exercises 2, 3,4 at the end of § 8.1. 

Compare your results with the solutions already obtained. Compute also the 
products of the pivot elements in each case and compare these with the 
corresponding products of the pivots when solutions were obtained without 
partial pivoting. 


8.4. Ill-conditioning 

An ill-conditioned system is one for which relatively small changes in the 
coefficients produce relatively large changes in the solution. It is one for 
which the solution is very sensitive to small changes in the coefficients, 
and so is one for which any uncertainties in the coefficients are con¬ 
siderably amplified in the solution. Such uncertainties will be present if, 
for example, the coefficients are the results of any observational measure¬ 
ments. Even if the coefficients are exact, small errors will be introduced 
during the calculations. Indeed it may happen that if the coefficients have 
been rounded or obtained experimentally (to a limited accuracy) then no 
meaningful solution can be obtained. The exact solution of the system with 
the rounded coefficients may have no significant figures in common with 
the exact solution of the same system with more accurate coefficients. This 
difficulty is not in the method of solution but is inherent in the system of 
equations itself. For such a system it is necessary to use many more 
significant figures in the calculations than are required in the final results. 

The whole problem of ill-conditioning and its recognition is a very 
interesting but rather complex one. However, pivots which decrease in 
magnitude at each stage, gradually tending to zero, give an indication of 
ill-conditioning. It is worth mentioning too that while a system with a 
nearly singular matrix will be ill-conditioned, it does not follow that an 
ill-conditioned system will have a nearly singular matrix. 

As an example of an ill-conditioned system consider the equations 

xi -f x 2 = 2 

X\ +1'0001x2 — 2*0001. 

Clearly the exact solution of this system is xi = 1, x 2 = 1. 
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However, if we now look at the system 

X\ + X 2 = 2 
+l-0001x 2 = 1-9999 

the exact solution is Xi = 3, X 2 = — 1. 

Thus changing one of the coefficients of the original system by approxi¬ 
mately 1 in 10 000 completely changes the solution. 

Also, the system 

Xi +X 2 = 2 

X!+0-9999x2 = 2-0001 

has the exact solution Xi = 3, X 2 = — 1 and so changing another coefficient 
of the original system by approximately 1 in 5000 also changes the 
solution completely. 


EXERCISE 

Compare the exact solutions of the two following systems 

Xi +0-99x 2 = 1*99 
0-99x!+0-98x2 = 1-97 
Xi +0-99 x 2 = : 2*00 
0-99xx+0-98x2 = 1-97. 


and 


First-order ordinary 
differential equations 



9.1. Introduction 

y' = 2 xy 

is a simple example of a first-order ordinary differential equation. It can 
be solved analytically as follows. We have 

% 

xdx 




and hence In y = 2(^x 2 ) + k 

where k is an arbitrary constant. 

Therefore In Ky = :: 2 where k=— lnX. 

1 


Hence 


y = Cc x where C = 


K 


Thus the general solution of the given differential equation is y = Ce* 2 
for arbitrary constants C. If now we are given the initial condition that 
y — 2 when x = 0, we obtain C = 2 and the corresponding particular 
solution of the given differential equation is y = 2e xl . 

When we solve a differential equation numerically, we cannot obtain a 
general solution as above but only a particular solution subject to some 
initial condition. Further, this numerical solution will not take the form 
of a functional expression as above but will simply be a sequence of 
numerical values of the solution function corresponding to a sequence of 
values of the independent variable. For example, the numerical solution 
of the differential equation y' = 2 xy with initial condition y = 2 when 
x = 0 is a sequence of numerical values of 2e* 2 corresponding to a 
sequence of values of x. 

The general problem we are concerned with in this chapter is that of 
obtaining numerical solutions of first-order ordinary differential equations 
of the form y' = /(x, y) subject to the initial condition y(x 0 ) = y 0 - Here 
x 0 and y 0 are given constants, and / is a given function such that the 
differential equation has a unique solution. Thus we are concerned with 
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determining a sequence of numerical values of y, corresponding to a 
sequence of values of x. In what follows we shall assume that the values 
of x in the sequence are equally spaced. We shall thus obtain values of y at 
equally spaced values of x. 

Now, in the special case when / is independent of y, we have 

y' = fix) 


and so 

Hence 

Therefore 


y'dx = 

J X o J xo 

-r 


f(x)dx. 




y(*i) = y(x 0 )+ 


f[x)dx. 


f(x)dx 


■r 


= yo+ f(x)dx. 


Now f*'f(x)dx can be evaluated approximately using one of the methods 
of numerical integration described in Chapter 6. Hence we obtain an 
approximation y t to the function value y(xi). Similarly we can obtain 
approximations y 2 , y 3 , ■ ■ ■, to the function values y(x 2 ), y(x 3 ),...; that is, 
we can obtain an (approximate) numerical solution of the given differential 
equation and initial condition. 


9.2. Euler’s method 

When / is not independent of y we have 

y' = fix, y) 

and hence, as above, 

y(xi) = y 0 + fix,y)dx. 

J x o 

The difficulty now is in the evaluation of \ x x l Jix, y)dx. Since the value of 
y is only known at x = x 0 , this is the only point at which the function 
f(x, y) can be evaluated and so the integral cannot be approximated 
using the trapezoidal rule or Simpson’s rule as in Chapter 6. However, 
if we assume that / is approximately constant in the interval xq ^ x ^ Xi, 
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then f(x,y) ~ /(x 0 , yo) in this interval and so we have 


I 


y(*i) — yo + /(x 0 , y 0 )dx 


— yo+/(* 0 , yo) 


dx 


= yo + hf{x 0 ,y 0 ) 

where xi — x 0 = K that is, we have obtained the approximation yi to y(xi) 
where 


yi = yo+hf(x 0 , y 0 )- 


Similarly then we obtain the numerical approximation y 2 to the value 
y(x 2 ) of the solution y at x 2 , where 

y 2 = yi+^/(xi,yi) and x 2 = x 1 + h. 

Next we can evaluate y 3 the numerical approximation to y(x 3 ) and then 
y 4 and so on. 

In general we have 

y r +i = y r + hf(Xr>yr) with x r+1 = x r + h,r = 0, 1 , 2 ,.... 

In this way we can generate a numerical approximation to the solution of 
the given differential equation at the points x r = x 0 + rh for r — 1, 2, 

3,-This method is called Euler's method. 

Graphically the method is illustrated in Figure 20. Starting at the point 
(*o> yo) the numerical solution moves along the tangent to the solution 
curve to the point (xi, yi). The error at this stage is y(xi)—yi. 
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Now the slope of the tangent to the solution curve at the point (xi, j/i) is 
calculated by evaluating /(xi, yO and the numerical solution proceeds by 
moving along this tangent to the point (x 2 , yi) where the process is again 
repeated and so on. 

9.2.1. Flow diagram 

A simple flow diagram for Euler’s method is as follows: 



The solution is started at x = x 0 and continued until x = x 0 + nh. 

Example. Use Euler’s method to obtain an approximate numerical 
solution of the differential equation y' = x 2 + 4x —\y with y = 4 when 
x = 0. (Use h — 0*05 and work to three significant figures throughout.) 
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Euler’s method for the equation y' = f(x , y) is given by 

JV +1 = y r +hf(x„y r ), r = 0,1,2,.... 

Now for the given differential equation we have f(x, y) = x 2 + 4x— \y 
and hence 

y r + 1 = y r + 0*05(x r 2 + 4x r -iy r ). 

Therefore yi = yo + 0*05(xg + 4xo-iyo) 

= 4 + 0*05(0 2 + 4 x 0—^ x 4) 

= 3-9. 

Then y 2 = y± + 0*05(x 2 + 4xi — \yi) 

= 3-9 + 0*05 ((0-05) 2 + 4(0*05) - i(3*9)) 

- 3*81. 

Continuing in this way we obtain the results shown in the following table. 
Now this particular differential equation and initial condition have an 
analytic solution. Values of this analytic solution, correct to three signifi¬ 
cant figures, are given in the last column of the table so that they can be 
compared with the corresponding values obtained using Euler’s method. 


x r 

yr 

f(Xr,y r ) 

y(x r ) 

0 

4 

-2 

4 

005 

3*9 

— 1*75 

3*91 

0-10 

3*81 

-1*50 

3*82 

0*15 

3*73 

-1*24 

3*76 

0-20 

3*67 

-1*00 

3*70 

0*25 

3*62 


3*65 


EXERCISES 

1* Use Euler’s method to obtain an approximate numerical solution of the 
differential equation / = 2xy with y = 2 when x = 0. 

(Use h = 0-05 and work to three significant figures throughout.) 

2. The method for obtaining an approximate numerical solution of the differential 
equation / — /(x, y) defined by 


y r +2 = yr + 2hf(x r+u y r+1 ) r = 0,1,2,... 

is known as the mid-point rule. Use this method to obtain an approximate 
numerical solution of the differential equation y' = x 2 + 4x—with y = 4 when 
x = 0, and y = 3-90 when x = 0*05, and compare this solution with that obtained 
in the text using Euler’s method. 

(Use h = 005; work to three significant figures throughout and continue the 
solution as far as x = 0*25.) 
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9.3. Truncation error in Euler’s method 

We must now attempt to obtain an estimate of some of the errors occurring 
during the application of Euler’s method. The value obtained as an 
approximation to y(xi) is given by 

yi = yo+hf(xo, yo)- 

However y(xi) = y(x 0 + fi) 

h 2 

= y(x 0 ) + hy'(xo) + jj y"(x 0 ) +... 

h 2 

= y(x 0 ) + hf(x 0 , y(xo)) + 2 j/'(x 0 , y(x 0 )) + - • • • 

Hence the error ei in our approximation is given by 

h 2 

ei =y(xi)-yi = 2 j/'(xo,yo)+--- 

if we assume that the initial value y 0 is equal to the exact function value 
y(xo). This error is called the local truncation error (at xi) of the method 
h 2 

and the first term —/'(x 0 , yo) in the series is called the principal local 

truncation error of the method. The error is local in the sense that it is the 
truncation error introduced due to the step from xo to Xi (assuming that 
yo = y(x 0 )). 

Consider now the step from xi to x 2 . We have 
y 2 = yi+hf(x u yi) 

h 2 

and y(x 2 ) = y(xi+h) = y(xi) + fcy'(xi)-l-—y"(xi) + -.. 

h 2 

= y(x 1 )+h/(xi, y(xi))+—/'(xi, y(xi)) + .... 

Hence the error now is 
e 2 = y(x 2 )-y2 

h 2 

= (y(xi)-yi) + h(f{xu y(xi))-/(x 1; yi)) + 2 j/'(xi, y(xj)) + .. ■ ■ 

This error is the total truncation error of the method at x 2 and is in part 
due to the truncation error made at the stage xi. If we assume that yi is 
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equal to the exact function value y(xi), then we are left only with the 
truncation error introduced due to the step from xi to x 2 , i.e. the local 
truncation error of the method at x 2 . Hence the local truncation error 
at x 2 is 

h 2 

= 2 jf(x u y 1 )+... 

since f(xi,y{xi)) = f(xi,yi) and f'(x 1 ,y{x 1 )) = f'(x t ,y 1 ) 
when y, = y(xj). 

The principal local truncation error at x 2 is 

Similarly, the local truncation error of Euler’s method at x r (r = 1, 2,...) is 

h 2 ™ 

2 y/ ( x r-l, yr-l) + ‘ * * 

and the principal local truncation error of the method at x r is 

h 2 


Note that 




Jl + f s ±. 

dx J dy 


Example. Evaluate the principal local truncation errors at x = 0 05 
and x = 0*25 in the solution of the equation 

/ = x 2 + 4x — jy 

with y = 4 at x = 0 given on page 159. 

The principal local truncation error at x r is 

h 2 ,,, . 


Here 


h = 0*05 and / = x 2 + 4x — \y. 
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Therefore /' = 2x +4— \(x 2 + 4x — jy) 

= ~jx 2 + 4 + iy. 

Hence the principal local truncation error at 0 05 is 
—(4 + |(4)) = 000625. 

The principal local truncation error at 0*25 is 

^^(-i(0-20) 2 + 4+i(3-67)) ~ 0-0061. 

EXERCISES 

1. Evaluate the principal local truncation errors at x = 0 05 and x = 0-25 in the 
solution of the equation / = 2 xy with y = 2 when x = 0. 

(Use the solution obtained to exercise 1 at the end of the last section.) 

2. Obtain an expression for the principal local truncation error of the mid-point 
rule given in exercise 2 at the end of the last section. 

9.4. Effect of decreasing step size 

It is clear that reducing the step size h will reduce the magnitude of the 
truncation error for any particular method. It does not follow however 
that a more accurate solution of the differential equation will necessarily 
be obtained after a few steps of the method. The reason for this is beyond 
the scope of this book. 
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Appendix 1 


Taylor series and the binomial expansion 

If a function / has continuous derivatives / (r) (r = 1, 2, ..., n) on an 
interval I which contains a then, for all x in I, 


fix) = f(a)+— rr f'(a)+ 


(x — a) 2 


/"(«) + *•• + 


(x-a)"-' 

(w —1)1 


"(aj + Rn, 


where the remainder term R n is given by 


Proof. The method used is that of induction. 

For n = 1, 

f{a) + Ri = /(a)+J f'it)dt 

= f(a) + {f(x)-f(a)} 

= fix)- 

Hence the result stated is true for n = 1. 

Now assume that the result is true when n = K where 1 < fc < n -1; 

that is, assume that 


= f(a) + ^YTf'ia) + - 


(x-oT^ ,, 
(fc — 1)! J 


1 

+ W^y. 


f w (t)(x-ty c ~ l dt 
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Then f(x) = f(a) + ^f'(a) +• • • + ^ » 


+ 


1 


(k- 1)! 






: 1 


f ik+i) (t)(x-t) k dt 


,\k- 1 




+ Tr n “ >+ *‘*‘ 


where 


^k+1 


1 

Y\ 


m x 

f k+1 \a)(x-t) k dt. 

a 


Thus the result is also true for n = /c +1. 

Hence, by induction, the result stated is true for all positive integers n. 
Now, if / has continuous derivatives of arbitrarily high order and if 
R n can be made as small as we please by taking n sufficiently large, then 
we write 


00 (x — aY 

/(*)= I 

n = 0 n\ 


The infinite series on the right-hand side is Taylors series expansion for 
the function / about a. 

In particular, if we take f(x) = (1 +x) p and a = 0 we obtain 


(1 -h x) p = 1 T X + 


p(p-l) 

2! 


n\ 


which is the binomial expansion of (l+x) p . This expansion is valid in 
general for — 1 < x < 1. (If p is a non-negative integer, the expansion 
terminates after a finite number of terms and is valid for all real values 
of x.) 

When p = — 1, we obtain 

(1-f x) -1 = 1 —x + x 2 — x 3 + - * * for — 1 < x < 1. 
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Here we prove that the nth differences of the nth degree polynomial 

a n x n +a„-ix n ~ i + ■'' + aix + ao (a„ # 0 ) 

have the constant value n! a n h n in which h is the difference between 
successive values of x at which the polynomial is evaluated. 

Consider the general first-degree polynomial aix + a 0 (ai # 0). Its 
difference table has the form 


f(x) 


Xo 

aiXo + ao 

ai(xi—x 0 ) 

Xi 

diXi+Clo 

fli(x 2 -xx) 

X 2 

aiX 2 + ao 

ai(x 3 -x 2 ) 

X3 

^1X3 + 

ai(x4-x 3 ) 

X4 



But (xi — Xo) = (X 2 —X 1 ) = (X3 — * 2 ) — ( x 4. * 3 ) h 

and so the first differences have the constant value a^h = 1 lah. Thus the 

theorem is true for n = 1. 

Now assume that the theorem is true for n equal to some positive 
integer m; that is, assume that the mth differences of an mth degree poly¬ 
nomial have the constant value m! a m h m . Consider the (m + l)th degree 
polynomial p m+ i(x) given by 

p m + i(x) = a m+ ix m+1 +a m x m + - \-a 2 x 2 + aiX + a 0 («m+i t 6 0). 

Form the function q(x) = p m + i(x + h) — p m + i(x). 

Then ^(x) = a m +i{(x + h) m+1 — x m+ ^ + a m {(x + h) m —x m } + - 

+ a 2 {(x + h) 2 -x 2 }+ai{(x + h)-x}. 

= (m+ 1 )a m +1 hx m + terms of integral degree less than m in x. 

Thus q(x) is a polynomial of degree m and so by our assumption its 
mth differences will have the constant value m! {(m + 1 )ci ,„, 1 h}h m , that is 
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the value ( m+l)\a m+1 h m+l . But the values of the first differences of 
p m+ i(x) are simply values of the polynomial q(x). For example, 

Pm +l(^l) Pm + l(^o) Pm+ l(^oF/l) Pm+ l(^o) = (ZC^o)- 

Hence the (m+l)th differences of the polynomial p m + i(x) will be mth 
differences of the polynomial q(x) and so will have the constant value 
(m+l)!a w+1 h m+1 . Thus, if the theorem is true for a positive integer m, 
it follows that it is also true for the positive integer (m+1). But we have 
shown that the theorem is true for the positive integer 1 and so, by 
induction, it must be true for all positive integer values of n. 
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. f* 2 f , . X-Xo * - ■ (X-XoMx-Xi) * 2 fLv 

Let 1 = J V 0+ ~1T A ^° + - 2ft 3 - A ) 

% X 2 

fodx = / 0 [x]3 = 2hf 0 . 

J xo 

X2X ^ Afodx = j- A/o[l(x — X 0 ) 2 ]Jo 
Jxo h h 

= 2/tA/o. 

Using integration by parts we obtain 

*’* A 2 f 0 dx = A- A 2 /o|[i(x-x 0 ) 2 (x-X 2 )]jg 

Jxo 2 h 2 2 h 2 l 

- i(x-x 0 ) 2 dx 

J XO 

= A A y 0 {2/i 3 -i[(x-x 0 ) 3 ]^} 

= A A 2/ o( 2fc*-$fc 3 ) 

= jhA 2 f 0 . 

Hence / = 2hfo + 2hAf 0 +jhA 2 f 0 

= ib{6/ 0 + 6(/i — /o) + {f2 ~ 2fi + /o)} 

= Wo + 4/i+/ 2 ). 
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Consider the polynomial 

p(x) = a n x n + a n - ix”" 1 +• * * + aix + a 0 ( a n # 0). 

We can obtain an estimate of £, the change in a root X due to a change 
a r in one of the coefficients a r , as follows. The quantity {X + e) is a root of 

d n x n + d n ~ iX n 1 + * * ‘ + (d r + Gl r )x r + ' * * "b UiX + do 

that is of p(x) + a r x r so that 

p(X + e) + d r (X + s) r = 0. 

But X is a root of p(x) so that p(X) = 0 . 

Hence p(X + e) — p(X) + ot r (X + e) r = 0 . 

Therefore (p(X ) + ep'(X) + •*•) — p(X) + a r (X r + rX r ~ l s + *••) = 0 . 

Then, to the first order in the quantities £ and a r , we have 

ep'(X) + oc r X r = 0 

and so £ = —ot r 

\ 

-7^) is large for a particular root X then a relatively small 

change oc r in the coefficient a r can produce quite a large change £ in the 
root A, that is, the given polynomial is ill-conditioned for this root. (This 
indication of ill-conditioning is still useful even when, due to the ill- 
conditioning itself, £ is not small and so the first-order approximation 
made above is quantatively very inaccurate.) Clearly then ill-conditioning 
will occur with nearly equal roots because of the smallness of p' in the 
neighbourhood of the roots, but it can occur in other cases also. We also 
see from the above expression for £ that an equation can be ill-conditioned 
for some of its roots and not for others, and that the smaller roots will 
tend to be better conditioned than the larger ones. Great care must be 


Hence, if 
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exercised when dealing with ill-conditioned roots. In such cases it will be 
necessary to work with many more significant figures than are required 
in the roots. It must be emphasized too that ill-conditioning is inherent 
in the equation itself and will have an effect whatever method is being 
used to determine the roots. 



Answers 


Chapter 2 

Section 2.4, page 7 

1. (i) 1*75; round-off error 0-0014; 

(ii) 0-0480; -0-00005; 

(iii) 349; 0-1862; 

(iv) 29-35; -0-00375; 

(v) 0-4863; -0-000049; 

(vi) 1-649; -0-0004997; 

(vii) 0-000498; 0-0000005; 

(viii) 5-0; 0-0032. 

2. (i) 1-751; 0-0004; 

(ii) 0 048; -0-00005; 

(iii) 349*186; 0-0002; 

(iv) 29-3462; 0-00005; 

(v) 0-4863; -0-000049; 

(vi) 1-6485; 0-0000003; 

(vii) 0-000; 0-0004985; 

(viii) 5-00; 0-0032. 


Section 2.6.4, page 19 

1. (i) 13-37 correct to two decimal places. Third decimal place is 4 or 5. 

(ii) 24-40 correct to two decimal places. Alternatively 28-403+0*001. 

(iii) 11-04 correct to two decimal places. To three decimal places it is either 
11-038 or 11-037. 

(iv) 8-68 correct to two decimal places. Third decimal place is 3 or 4. 

(v) —2-3 correct to one decimal place. Second decimal place is 4 or 5. 
Alternatively -6-345 + 0-001. 

(vi) 11-02 correct to two decimal places. Third decimal place is 0 or 1. 

(vii) No significant figures obtainable. Sign of number indeterminate. 

2. Better to add numbers and then round-off the total. 

3. (i) 147-5. To two decimal places the true value may be 147-49,147-50 or 147-51. 

That is 147-50+0-01. 

(ii) 0-094. The fourth decimal place is 2, 3 or 4. Alternatively 0-0943 + 0-0001. 

(iii) 345-9 or 345-92 + 0-03. 

(iv) 1-934 or 1-9340 + 0-0002. 

(v) 0-00678. To six decimal places the true value may be 0-006779 or 0-006780. 
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4. From tables 72-14 = 1-4629 to four decimal places. 

1 /0-005^ 

Approximate absolute error is — I --— 


~ 0 - 002 . 


Hence Ja = 1-463±0 002. 

From tables 78-27 = 2-8758 to 4 decimal places. 

• 1/O°05\ ftm1 

Approximate absolute error is - II _ u wi. 

Hence 7b = 2-876 ±0-001. 

5. From six-figure tables sin0-359 ^ 0-351338. ^ n n005 

Approximate maximum absolute error is 0-0005 cos 0 35 
Hence sin a = 0-3513 ±0-0005. 

From six-figure tables sin 0-745 0-677972. _ 0-0004 

Approximate maximum absolute error is 0-0005 cos 07 5 

Hence sin b = 0-6780 ± 0-0004. 

Section 2.7, page 22 

1. 0-0228. 

2. 17-9, 0-0559 (last digit not reliable. Approximate absolute error less than or 
equal to 0-0002). 

3 _ 100 0*0400 

' Value'of polynomial at x = 100 is -4 (quite different from zero!). 

(x^lOO) (x —0-0400) = x 2 + 99-96x—4 which is a good approximation to the 

given polynomial. 

1—cosx , 

4. — : -= tan ix. 

sinx 

_1_1 

7 X 2 = _ Zl _ -0-0637 to 3 significant figures. 

x—4 2 ( 27 x+x) 


Chapter 3 

Section 3.1.1, page 27 

1. -0-71875. 

2. -6-032. 

Section 3.2, page 29 

1. 5x 3 + 13x 2 4- 25x + 56,105. 

2. 3x 6 —8x 5 + 20x 4 - 58x 3 + 176x 2 + 527x- 1580, 4739. 

3. -1-66108. 

4. 1-33994. 
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5. p(x) = 



q(x) + R — {ax~ b) 



+ R. 


0-5x 3 - l*25x 2 - 4-875x + 84375, 21-5625. 


Chapter 4 

Section 4.1.4, page 33 

Sf- 1/ 2 = -0-05, S 2 f 3 = 0-0040, S% 2 = 0-1998, txf 2 = -0-0179, A 2 / 3 = 0-0028, 
A 3 /_ 3 = -0-0501, V/ 3 = -0-0179, V 3 /_ x = -0-2499, = 0-0017. 


Section 4.4, page 37 

3x 2 -5x + 0-25. 

This is a bad fit to the given function at intermediate values of x. 


Chapter 5 

Section 5.1.2, page 46 

1. 1-9217, 1-9102, 1-9256. 

2. 5-357. 

3. 5-956. 

4. 0-175. 

Section 5.2, page 48 

1. 5-358. Quadratic interpolation gives better result. 

2. 5-954. Quadratic interpolation gives better result. 
5-6143. 

3. 0-164. Quadratic interpolation gives better result. 

Section 5.3, page 51 

1. 14-93. 

2. 1-348. 

3. 2x 2 -3x4 0-75. 

4. -3x 2 44x4 1-25. 


Chapter 6 

Section 6.1.5, page 67 

1. 0-14752,0 15216,0-15332, 0-15361. 

2. (i) 0-70833. | Round-off error | ^ 0-000005. [Truncation error | ^0-04. 

(ii) 0-69702, 0-000005, 0-01. 

(iii) 0-69412, 0 000005, 0-003. 
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3. Take h = 0 09 and retain four decimal places. 

0-2624 (True value rounded to five decimal places is 0-26236). 

Section 6.2.4, page 73 

1. 1-4939, 1-4014,1-3974. 

2. (i) 0-69445,0000005,0-008. 

(ii) 0-69326,0-000005,0-0005. 

(iii) 0-69315,0-000005, 0-00003. 

3. Take h = 015 and retain four decimal places. 0-262. 


Section 6.3, page 74 . , ^ i , i, t £ m„i6 _ n.nnf, 

1. (i) Truncation error is negative and has magnitude g Tf (l-6-0)e - 0 U20. 

Magnitude of round-off error g nhh 10"* - 0-8 x 10"* = 00008 rounding 
to three decimal places. This is much less than the possible magnitude of 
the truncation error and so there is no need to use any more than three 
decimal places in the working. 

Then f e x dx a 3-966. 


Jo 

But since maximum truncation error is estimated by -0*026 we see that 
the value of the integral lies between 3*97 and 3*94. 

(ii) Maximum truncation error is estimated by -0*00007. 

Maximum round-off error (using all the available figures) is estimated by 


0*00008. 


Then j* e x dx ^ 3*95305. 

Using the above error estimates we see that the value of the integral lies 
between 3*95313 and 3*95290 and so is 3*953 to three decimal places. 

2. Using the trapezoidal rule and Simpson’s rule and successively halving the 
interval size h we obtain in both cases the value 0*361. 

(Analytically the true value of the integral rounded to four decimal places is 
0*3614.) 

3. 1*91. True value rounded to three decimal places is T912. 


Section 6.4.2, page 84 

1. ^ h , \h. (c.f. trapezoidal rule). 

2. iK f/z, jh, (c.f. Simpson’s rule). 

3. i if, A- 

h __h_ 

4 ' 73 ’ \/ 3 ' 

I Truncation error | is much less than x 32 x 10 5 ^ 0*000008. 

| Round-off error | < -%f-(9 +19 + 5 4* l)il0 4 — 0*00001. 
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Chapter 7 

Section 7.2, page 88 

1. 0*424. 

2. Process cycles round 0*514,0*508,0*514, 

Section 7.3.1, page 98 

1. [0,i]. 


x 3 increases in (0, ?], | x — 11 decreases and so \F'\ decreases. 

Hence min|F| in (0,i] is |F(i)l = 4 > 1. Therefore \F'(X)\ > 1. 

Hence method is not suitable. 

x 0 = 0*4, Xi = 0*625, x 2 = —0*320, x 3 = 8*008. 

3 ' F(X) = 3(i-x) 2/3 ' 

(i—x) decreases and so | F' | increases as x increases from j to 

Hence min | F' \ in [}, i) is | F(i) | = > 1. 

Therefore | F\X) \ > 1 and so the method is not suitable. 

x 0 = 0*4, Xi = 0*464, x 2 = 0*330, x 3 = 0*554, x 4 = —0*378, x 5 = 0*958. 

4. (0, n). 

5. /( —4) = -20,/(-3) = 7,/(0) = 4,/(l) = -5,/(2) = -8,/(3) = 1. 

Hence roots in stated intervals. 

f< 1 for xe[0,1] 

F'(x) = t$x 2 which is< > > 1 for xe[—4, —3] 

( > r§ > 1 for xe[2,3]. 

Hence given formula is suitable for the root in [0,1] but not for the other two 
x„ +1 = (10x„—4) 1 ' 3 
04067, —3-35, 2-94. 

6. /(0) = i/(^) = i-k<0. 

Hence there is a root in (0. ^tt). 

X/j+ i = sin x„ + j. 

1*497. 

Section 7.5, page 107 

1. 0*407, 2*94. 

2. 0*51. 

Section 7.6, page 111 

1. 0*424. 


2. 0*51. 



ANSWERS 

3. 2*94. 

4. 0407. 

Section 7.7.6, page 121 

1. 0424. 

2. 0-51. 

3. 1497. 

4. 1456. 

5. 0-567. 

7. x„+i = x„(2 — ax„). 

8. 1-857, 4-536. 

9. 0-787. 

Section 7.8, page 126 

1. 0407, -3-35,2-94. 

2. -0-170, 1-689, 3481. 

3. 1-040, -1-341. 

Section 7.9, page 131 

1. 2*396,1-576. 

2. 0-768,0*732. 


Chapter 8 

Section 8.1, page 139 

1 . 1 - 000 , 2 - 000 , - 1 - 000 . 

2. 1-183,0-035,0484. 

3. 179, -138, -45-5. 

4. 549, 1*01, -0-0306. 


Section 8.2, page 

147 

/ 

' 0-1062 

1. Inverse 1 

-0-06233 
v -0-1813 

Roots 

0-9996, 

1 

( 01145 

Inverse 1 

-0-07341 
V-0-02804 


0-2062 

—0-01250\ 

-0-06233 

0-1250 ) 

0-1187 

-0-03750/ 

2-001, 

- 1 - 000 . 

0-2844 

— 0-06913 \ 

-0-08769 

0-08205 ) 

-0-2382 

0-1490 / 
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Roots 

1*1787, 0*0379, 0*4899. 

Inverse 

/—1*04 0*314 

0*817 0*0486 

\ 0*296 -0*141 

- 0*0869 \ 
-0*307 1 

0*108 J 

Roots 

177, -138, 

-45*5. 

Inverse 

/2-14 1*26 

0*465 0*277 

\0*135 -0*0466 

0*123 \ 
-0*00695 ) 

0*00122 J 

Roots 

4*53, 0*993, 

-0*0280. 


Section 8.3.2, page 152 

1. (i) 0-00,1*00. 

(ii) 1-00, 1-00. 

2. 1*181,0*034, 0*483 (True solution 1*273080, 0*034557, 0*483801) 

175, -139, —443 (175*624, -138*661, -44*6452) 

0*0186, 1*23, -0*0701 (0*01866, 1*231, -0*06993) 


Section 8.4, page 154 

Solution of first system Xi = 1, x 2 = 1. 

Solution of second system x t — —97, x 2 = 100. 


Chapter 9 


Section 9.2.1, 

1. X 

page 159 

y 

2. x 

y 

0 

2 

0 

4 

0*05 

2 

0*05 

3*90 

0*10 

2*01 

0*10 

3*82 

0*15 

2*03 

0*15 

3*75 

0*20 

2*06 

0*20 

3*70 

0*25 

2*10 

0*25 

3*65 


Section 93, page 162 

1. 0*005, 0*006. 

2. jh 3 y'"(x r ). 



Index 


accuracy 8 

Aitken’s <5 2 -process 107 

back substitution 133 
binomial expansion 163 
Birge-Vieta process 122, 129 

derivatives, relationship with differences 37 
differences, backward 31 
central 31 
forward 31 
of polynomials 34, 165 
relationship with derivatives 37 
differential equations, first-order 155 
Euler’s method for solution of 156 
mid-point rule for solution of 159 
differentiation, numerical 38 

equations, with nearly equal roots 126 
solution of linear systems of 132 
ill-conditioned systems of linear 153 
errors 4 
absolute 7 

effects of, in functional evaluation 18 
effects of, on addition 9 
effects of, on division 16 
effects of, on multiplication 13 
effects of, on subtraction 9 
of the method 6 
relative 7 
round-off 4 

in linear interpolation 44 
in Simpson’s rule 58 
in trapezoidal rule 56 
in difference tables 33 
reducing 20 

statistical treatment of 23 
truncation 6 

in Euler’s method 160 
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INDEX 


in Newton’s forward-difference method of degree one 45 
in Simpson’s rule 58 
in trapezoidal rule 56 
local 160 
principal local 160 
principal 45 

Euler-Maclaurin formulae 81 
Euler’s method 156 
truncation error in 160 
local 160 
principal local 160 

false position, rule of 103 
finite difference table 30 
round-off errors in 33 

Gaussian elimination 133 
checking procedures for 139 
complete pivoting in 150 
for determination of inverse matrix 145 
partial pivoting in 148 
pivoting in 147 

with multiple right-hand sides 143 
Gaussian formula 84 

ill-conditioned equation 127, 168 
system of linear equations 153 
integration 52 
Euler-Maclaurin formulae 81 
Gaussian formula 84 
method of undetermined coefficients 75 
Newton-Cotes formulae 75 
Simpson’s rule for 67, 78 
application of 71 

comparison with trapezoidal rule 73 
round-off error in 69 
truncation error in 69 
trapezoidal rule for 54, 76 
application of 58 
comparison with Simpson’s rule 73 
round-off error in 56 
truncation error in 56 
with interval halving 60 
interpolation 40 
higher-order formulae 48 
linear 41 

round-off error in 44 
truncation error in 44 
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quadratic 46 

inverse matrix, determination of 145 
iteration 86 

Aitken’s <5 2 -process 107 
Birge-Vieta process 122, 129 
convergence of 90 
first-order 100 

for equations with nearly equal roots 126 
Newton-Raphson method 111 
convergence of 115 
geometrical interpretation of 112 
order of 118 

particular applications of 118 
practical approach to 117 
order of 99 

rule of false position 103 
second-order 100 
simple 86 

linear interpolation 41 
round-off error in 44 
truncation error in 44 

mid-point rule 159 
mistakes 6 

nesting 24 

Newton-Cotes formulae 75 
Newton-Raphson method 111 
convergence of 115 
geometrical interpretation of 112 
order of 118 

particular applications of 118 
practical approach to 117 
Newton’s forward-difference formula, 
of degree one 41 

round-off error in 44 
truncation error in 45 
of degree n 48 
of degree two 46 

pivotal equations 136 
pivoting 147 
complete 150 
partial 148 
pivots 136 

polynomial, determination of roots of 122 
differences of 34, 165 
evaluation of (nesting) 24 
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INDEX 


fitting, to function values 35, 50 
synthetic division of 28 
precision 8 

quadratic interpolation 46 

round-off errors, see errors , round-off 

Simpson’s rule 67, 78 
application of 71 

comparison with trapezoidal rule 73 
round-off error in 69 
truncation error in 69 
synthetic division 28 
use of, in Birge-Vieta method 123 
systems of linear equations, ill-conditioned 153 
solution of 132 
using Gaussian elimination 133 
triangular 133 

Taylor series 163 
trapezoidal rule 54, 76 
application of 58 
comparison with Simpson’s rule 73 
round-off error in 56 
truncation error in 56 
truncation error, see errors , truncation 

undetermined coefficients, derivation of trapezoidal rule by 76 
derivation of Simpson’s rule by 78 
method of 75. 




Numerical Analysis 


his text for sixth-year school pupils and first-year 
college and university students provides an introduction 
to many important topics in numerical analysis, including 
interpolation, integration, iteration, solution of sets of 
simultaneous linear equations and solution of ordinary 
differential equations. 

A middle course is steered between theory and 
computation. An attempt is made to provide the reader 
with a foci ity in the use of specific methods and also an 
understanding of their limitations. 

Many exercises and worked problems are provided, 

Ihe basic material provided should pave the way for a 
more advanced study of the subject. 























